| Commit message (Collapse) | Author | Age |
|
|
|
| |
reductions.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
in certain situations.
|
|
|
|
|
|
| |
1.Only computing about half of the factors and use complex conjugate symmetry for the rest instead of all to save time.
2.All twiddles are calculated in double because that gives the maximum achievable precision when doing float transforms.
3.Reducing all angles to the range 0<angle<pi/4 which gives even more precision.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `std::result_of` meta struct is deprecated in C++17 and removed
in C++20. It was still slipping through due to a faulty definition of
`EIGEN_HAS_STD_RESULT_OF`.
Added a new macro `EIGEN_HAS_STD_INVOKE_RESULT` and
`Eigen::internal::invoke_result` implementation with fallback for
pre C++17.
Replaces the `result_of` definition with one based on `std::invoke_result`
for C++17 and higher.
For completeness, added nullary op support for c++03.
Fixes #1850.
|
|
|
|
| |
to non-const when using vector_pair with certain built-ins.
|
|
|
|
| |
HIP does not support new/delete on device, so test is skipped.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CMake complains that the package name does not match when the case
differs, e.g.:
```
CMake Warning (dev) at /usr/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:273 (message):
The package name passed to `find_package_handle_standard_args` (UMFPACK)
does not match the name of the calling package (Umfpack). This can lead to
problems in calling code that expects `find_package` result variables
(e.g., `_FOUND`) to follow a certain pattern.
Call Stack (most recent call first):
cmake/FindUmfpack.cmake:50 (find_package_handle_standard_args)
bench/spbench/CMakeLists.txt:24 (find_package)
This warning is for project developers. Use -Wno-dev to suppress it.
```
Here we rename the libraries to match their true cases.
|
|
|
|
|
|
|
| |
Accuracy is too poor - requires at least two Newton iterations, but then
it is no longer significantly faster than `vsqrt`.
Fixes #2094.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added `EIGEN_HAS_STD_HASH` macro, checking for C++11 support and not
running on GPU.
`std::hash<float>` is not a device function, so cannot be used by
`std::hash<bfloat16>`. Removed `EIGEN_DEVICE_FUNC` and only
define if `EIGEN_HAS_STD_HASH`. Same for `half`.
Added `EIGEN_CUDA_HAS_FP16_ARITHMETIC` to improve readability,
eliminate warnings about `EIGEN_CUDA_ARCH` not being defined.
Replaced a couple C-style casts with `reinterpret_cast` for aligned
loading of `half*` to `half2*`. This eliminates `-Wcast-align`
warnings in clang. Although not ideal due to potential type aliasing,
this is how CUDA handles these conversions internally.
|
|
|
|
|
|
|
| |
double that
make pow<double> accurate the 1 ULP. Speed for AVX-512 is within 0.5% of the currect
implementation.
|
|
|
|
|
| |
The previous implementation caused a buffer overflow trying to calculate non-
zero counts for columns that no longer exist.
|
| |
|
|
|
|
| |
functions.
|
|
|
|
|
|
|
| |
Also modified cmake/FindAdolc.cmake to eliminate warnings, and added
search paths to match install layout.
Fixed: #2157
|
| |
|
| |
|
|
|
|
|
| |
(only if not HIPCC)."
This reverts commit 12fd3dd655e37ba26e7ab236d32163e0aa35da39
|
| |
|
|
|
|
| |
available. Otherwise the accuracy drops from 1 ulp to 3 ulp.
|
|
|
|
| |
not HIPCC).
|
| |
|
|
|
|
|
|
|
|
|
|
| |
macOS defines int64_t as long long even for C++03 and therefore expects
a template specialization
internal::make_unsigned<long long>,
for C++03. Since other platforms define int64_t as long for C++03 we
cannot add the specialization for all cases.
|
| |
|
| |
|
|
|
|
| |
for float, while still being 10x faster than std::pow for AVX512. A future change will introduce a specialization for double.
|
|
|
|
|
|
| |
The original implementation fails for 0, denormals, inf, and NaN.
See #2150
|
| |
|
|
|
|
| |
kernel)
|
|
|
|
|
|
|
|
| |
In two places in SuperLUSupport.h, a local variable 'size' is
created that is used only inside an eigen_assert. Remove these,
just fetch the required values inside the assert statements.
This avoids annoying -Wunused warnings (and -Werror=unused errors)
in NDEBUG builds.
|
| |
|
|
|
|
|
| |
It's slightly faster and slightly more accurate, allowing our current
packetmath tests to pass for sqrt with a single iteration.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original clamping bounds on `_x` actually produce finite values:
```
exp(88.3762626647950) = 2.40614e+38 < 3.40282e+38
exp(709.437) = 1.27226e+308 < 1.79769e+308
```
so with an accurate `ldexp` implementation, `pexp` fails for large
inputs, producing finite values instead of `inf`.
This adjusts the bounds slightly outside the finite range so that
the output will overflow to +/- `inf` as expected.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous implementations produced garbage values if the exponent did
not fit within the exponent bits. See #2131 for a complete discussion,
and !375 for other possible implementations.
Here we implement the 4-factor version. See `pldexp_impl` in
`GenericPacketMathFunctions.h` for a full description.
The SSE `pcmp*` methods were moved down since `pcmp_le<Packet4i>`
requires `por`.
Left as a "TODO" is to delegate to a faster version if we know the
exponent does fit within the exponent bits.
Fixes #2131.
|
| |
|
|
|
|
| |
As discussed in #2143 we remove editor specific comments.
|
| |
|
|
|
|
| |
test adjoint and transpose solves
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently if compiled by NVCC, the `MatrixBase::bdcSvd()` implementation
is skipped, leading to a linker error. This prevents it from running on
the host as well.
Seems it was disabled 6 years ago (5384e891) to match `jacobiSvd`, but
`jacobiSvd` is now enabled on host. Tested and runs fine on host, but
will not compile/run for device (though it's not labelled as a device
function, so this should be fine).
Fixes #2139
|
|
|
|
| |
result is always zero or infinite unless x is one.
|
|
|
|
|
|
|
|
|
| |
We are potentially seeing some accuracy issues with these. Ideally we
would hand off to `float`, but that's not trivial with the current
setup.
We may want to consider adding `ppow<Packet>` and `HasPow`, so
implementations can more easily specialize this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM,
leading to excessive 16-byte register spills, slowing down basic f32
matrix multiplication by approx 50%.
By specializing `gebp_traits`, we can eliminate the register spills.
Volatile inline ASM both acts as a barrier to prevent reordering and
enforces strict register use. In a simple f32 matrix multiply example,
this modification reduces 16-byte spills from 109 instances to zero,
leading to a 1.5x speed increase (search for `16-byte Spill` in the
assembly in https://godbolt.org/z/chsPbE).
This is a replacement of !379. See there for further discussion.
Also moved `gebp_traits` specializations for NEON to
`Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside
other NEON-specific code.
Fixes #2138.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately `std::bit_and` and the like are host-only functions prior
to c++14 (since they are not `constexpr`). They also never exist in the
global namespace, so the current implementation always fails to compile via
NVCC - since `EIGEN_USING_STD` tries to import the symbol from the global
namespace on device.
To overcome these limitations, we implement these functionals here.
|