| Commit message (Collapse) | Author | Age |
... | |
| | | | |
|
| | | | |
|
| | | | |
|
| | | |
| | | |
| | | |
| | | | |
: computeFromHessenberg can take any expression for matrixQ, not only an HouseholderSequence.
|
| | | |
| | | |
| | | |
| | | | |
in reducers.
|
| | | |
| | | |
| | | |
| | | | |
rewriting them as: s*(A.lazyProduct(B)) to save a costly temporary. Measured speedup from 2x to 5x...
|
| | | | |
|
| | | | |
|
|\ \ \ \
| | | | |
| | | | |
| | | | | |
Use std::complex constructor instead of assignment from scalar
|
| | | | | |
|
|/ / / /
| | | |
| | | |
| | | |
| | | | |
Fixes GCC conversion to non-scalar type requested compile error when
using boost::multiprecision::cpp_dec_float_50 as scalar type.
|
| | | | |
|
| | | | |
|
|\ \ \ \ |
|
| | | | | |
|
| |\ \ \ \
| |/ / / /
|/| | | |
| | | | | |
Fix oversharding bug in parallelFor.
|
| | |_|/
| |/| | |
|
|/ / / |
|
| |/
|/| |
|
|\ \
| | |
| | |
| | |
| | |
| | | |
Derivative of the incomplete Gamma function and the sample of a Gamma random variable
Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com>
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| |\ \ |
|
| | | |
| | | |
| | | |
| | | | |
Previously, when computing the derivative, it used a relative error threshold. Now it uses an absolute error threshold. The behavior for computing the value is unchanged. This makes more sense since we do not expect the derivative to often be close to zero. This change makes the derivatives about 30% faster across the board. The error for the igamma_der_a is almost unchanged, while for gamma_sample_der_alpha it is a bit worse for float32 and unchanged for float64.
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
variable.
In addition to igamma(a, x), this code implements:
* igamma_der_a(a, x) = d igamma(a, x) / da -- derivative of igamma with respect to the parameter
* gamma_sample_der_alpha(alpha, sample) -- reparameterization derivative of a Gamma(alpha, 1) random variable sample with respect to the alpha parameter
The derivatives are computed by forward mode differentiation of the igamma(a, x) code. Although gamma_sample_der_alpha can be implemented via igamma_der_a, a separate function is more accurate and efficient due to analytical cancellation of some terms. All three functions are implemented by a method parameterized with "mode" that always computes the derivatives, but does not return them unless required by the mode. The compiler is expected to (and, based on benchmarks, does) skip the unnecessary computations depending on the mode.
|
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs.
Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor)
Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.
|
| |/
|/|
| |
| |
| | |
specializations. Otherwise causes problems with small fixed size matrix multiplication (call to
0x00 in call_assignment_no_alias in debug mode or trap in release with CUDA 9.1).
|
|\ \
| | |
| | |
| | | |
Fix compilation of special functions without C99 math.
|
| |\ \
| |/ /
|/| | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | |
| | |
| | |
| | | |
indicate a function is not __declspec(nothrow)
|
| |/
| |
| |
| |
| |
| | |
The commit with Bessel functions i0e and i1e placed the ifdef/endif incorrectly,
causing i0e/i1e to be undefined when EIGEN_HAS_C99_MATH=0. These functions do not
actually require C99 math, so now they are always available.
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
bug #1548
The macro EIGEN_IDEAL_MAX_ALIGN_BYTES is being incorrectly set to 32
on AVX512 builds. It should be set to 64. In the current code it is
only set to 64 if the macro EIGEN_VECTORIZE_AVX512 is defined. This
macro does get defined in AVX512 builds in Core, but only after Macros.h,
the file that defines EIGEN_IDEAL_MAX_ALIGN_BYTES, has been included.
This commit fixes the issue by setting EIGEN_IDEAL_MAX_ALIGN_BYTES to
64 if __AVX512F__ is defined.
|
| |
| |
| |
| |
| |
| | |
1. Added new packet functions using SIMD for NByOne, OneByN cases
2. Modified existing packet functions to reduce index calculations when input stride is non-SIMD
3. Added 4 test cases to cover the new packet functions
|
|/
|
|
| |
PGI (the later being the one that has the wrong prototype).
|
|\
| |
| |
| |
| |
| | |
Exponentially scaled modified Bessel functions of order zero and one.
Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com>
|
| | |
|
| | |
|