| Commit message (Collapse) | Author | Age |
|
|
|
| |
(cherry picked from commit 838f3d8ce22a5549ef10c7386fb03040721749a0)
|
|
|
|
| |
(cherry picked from commit 2883e91ce5a99c391fbf28e20160176b70854992)
|
|
|
|
| |
(cherry picked from commit abbf95045009619f37bd92b45433eedbfcbe41cf)
|
|
|
|
| |
(cherry picked from commit c22c103e932e511e96645186831363585a44b7a3)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
With !406, we accidentally broke arm 32-bit NEON builds, since
`vsqrt_f32` is only available for 64-bit.
Here we add back the `rsqrt` implementation for 32-bit, relying
on a `prsqrt` implementation with better handling of edge cases.
Note that several of the 32-bit NEON packet tests are currently
failing - either due to denormal handling (NEON versions flush
to zero, but scalar paths don't) or due to accuracy (e.g. sin/cos).
|
|\ |
|
| |\ |
|
| | | |
|
| | | |
|
| | |
| | |
| | |
| | | |
Forgot to test this. Fixes bug introduced in !416.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The original will saturate if the input does not fit into an integer
type. Here we fix this, returning the input if it doesn't have
enough precision to have a fractional part.
Also added `pceil` for NEON.
Fixes #1969.
|
| | | |
|
| |/
|/|
| |
| | |
reductions.
|
| | |
|
| | |
|
| | |
|
| | |
|
| |\ |
|
| |/
|/|
| |
| | |
reductions.
|
| |
| |
| |
| | |
reductions.
|
| | |
|
|/ |
|
|
|
|
| |
in certain situations.
|
|
|
|
|
|
| |
1.Only computing about half of the factors and use complex conjugate symmetry for the rest instead of all to save time.
2.All twiddles are calculated in double because that gives the maximum achievable precision when doing float transforms.
3.Reducing all angles to the range 0<angle<pi/4 which gives even more precision.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `std::result_of` meta struct is deprecated in C++17 and removed
in C++20. It was still slipping through due to a faulty definition of
`EIGEN_HAS_STD_RESULT_OF`.
Added a new macro `EIGEN_HAS_STD_INVOKE_RESULT` and
`Eigen::internal::invoke_result` implementation with fallback for
pre C++17.
Replaces the `result_of` definition with one based on `std::invoke_result`
for C++17 and higher.
For completeness, added nullary op support for c++03.
Fixes #1850.
|
|
|
|
| |
to non-const when using vector_pair with certain built-ins.
|
|
|
|
| |
HIP does not support new/delete on device, so test is skipped.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CMake complains that the package name does not match when the case
differs, e.g.:
```
CMake Warning (dev) at /usr/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:273 (message):
The package name passed to `find_package_handle_standard_args` (UMFPACK)
does not match the name of the calling package (Umfpack). This can lead to
problems in calling code that expects `find_package` result variables
(e.g., `_FOUND`) to follow a certain pattern.
Call Stack (most recent call first):
cmake/FindUmfpack.cmake:50 (find_package_handle_standard_args)
bench/spbench/CMakeLists.txt:24 (find_package)
This warning is for project developers. Use -Wno-dev to suppress it.
```
Here we rename the libraries to match their true cases.
|
|
|
|
|
|
|
| |
Accuracy is too poor - requires at least two Newton iterations, but then
it is no longer significantly faster than `vsqrt`.
Fixes #2094.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added `EIGEN_HAS_STD_HASH` macro, checking for C++11 support and not
running on GPU.
`std::hash<float>` is not a device function, so cannot be used by
`std::hash<bfloat16>`. Removed `EIGEN_DEVICE_FUNC` and only
define if `EIGEN_HAS_STD_HASH`. Same for `half`.
Added `EIGEN_CUDA_HAS_FP16_ARITHMETIC` to improve readability,
eliminate warnings about `EIGEN_CUDA_ARCH` not being defined.
Replaced a couple C-style casts with `reinterpret_cast` for aligned
loading of `half*` to `half2*`. This eliminates `-Wcast-align`
warnings in clang. Although not ideal due to potential type aliasing,
this is how CUDA handles these conversions internally.
|
|
|
|
|
|
|
| |
double that
make pow<double> accurate the 1 ULP. Speed for AVX-512 is within 0.5% of the currect
implementation.
|
|
|
|
|
| |
The previous implementation caused a buffer overflow trying to calculate non-
zero counts for columns that no longer exist.
|
| |
|
|
|
|
| |
functions.
|
|
|
|
|
|
|
| |
Also modified cmake/FindAdolc.cmake to eliminate warnings, and added
search paths to match install layout.
Fixed: #2157
|
| |
|
| |
|
|
|
|
|
| |
(only if not HIPCC)."
This reverts commit 12fd3dd655e37ba26e7ab236d32163e0aa35da39
|
| |
|
|
|
|
| |
available. Otherwise the accuracy drops from 1 ulp to 3 ulp.
|
|
|
|
| |
not HIPCC).
|
| |
|
|
|
|
|
|
|
|
|
|
| |
macOS defines int64_t as long long even for C++03 and therefore expects
a template specialization
internal::make_unsigned<long long>,
for C++03. Since other platforms define int64_t as long for C++03 we
cannot add the specialization for all cases.
|
| |
|
| |
|
|
|
|
| |
for float, while still being 10x faster than std::pow for AVX512. A future change will introduce a specialization for double.
|
|
|
|
|
|
| |
The original implementation fails for 0, denormals, inf, and NaN.
See #2150
|
| |
|