| Commit message (Collapse) | Author | Age |
|\ |
|
| |\ |
|
| | |\ |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The commit with Bessel functions i0e and i1e placed the ifdef/endif incorrectly,
causing i0e/i1e to be undefined when EIGEN_HAS_C99_MATH=0. These functions do not
actually require C99 math, so now they are always available.
|
| | | | |
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
variable.
In addition to igamma(a, x), this code implements:
* igamma_der_a(a, x) = d igamma(a, x) / da -- derivative of igamma with respect to the parameter
* gamma_sample_der_alpha(alpha, sample) -- reparameterization derivative of a Gamma(alpha, 1) random variable sample with respect to the alpha parameter
The derivatives are computed by forward mode differentiation of the igamma(a, x) code. Although gamma_sample_der_alpha can be implemented via igamma_der_a, a separate function is more accurate and efficient due to analytical cancellation of some terms. All three functions are implemented by a method parameterized with "mode" that always computes the derivatives, but does not return them unless required by the mode. The compiler is expected to (and, based on benchmarks, does) skip the unnecessary computations depending on the mode.
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| | |
This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs.
Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor)
Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.
|
|/
|
|
|
|
| |
1. Added new packet functions using SIMD for NByOne, OneByN cases
2. Modified existing packet functions to reduce index calculations when input stride is non-SIMD
3. Added 4 test cases to cover the new packet functions
|
|
|
|
|
|
| |
The functions are conventionally called i0e and i1e. The exponentially scaled version is more numerically stable. The standard Bessel functions can be obtained as i0(x) = exp(|x|) i0e(x)
The code is ported from Cephes and tested against SciPy.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|\
| |
| |
| | |
TensorFlow.
|
| |
| |
| |
| | |
issues for large FFTs. https://github.com/tensorflow/tensorflow/issues/10749#issuecomment-354557689
|
|/
|
|
| |
TensorFlow.
|
| |
|
|\
| |
| |
| |
| |
| |
| | |
bug #1464 : Fixes construction of EulerAngles from 3D vector expression.
Approved-by: Tal Hadad <tal_hd@hotmail.com>
Approved-by: Abhijit Kundu <abhijit.kundu@gatech.edu>
|
| | |
|
|/ |
|
| |
|
|
|
|
| |
introducing EIGEN_CUDACC_VER
|
|\
| |
| |
| | |
Improved support for OpenCL
|
| |
| |
| |
| | |
Tensor Trace op
|
|/
|
|
|
|
|
|
|
|
|
|
| |
request PR-14)
Applying Benoit's comment for Fixing ImageVolumePatch.
* Applying Benoit's comment for Fixing ImageVolumePatch. Fixing conflict on cmake file.
* Fixing dealocation of the memory in ImagePatch test for SYCL.
* Fixing the automerge issue.
|
|\
| |
| |
| | |
Improved support for OpenCL
|
| | |
|
| |\ |
|
| |\ \ |
|
| |_|/
|/| | |
|
| | | |
|
| |/
| |
| |
| |
| |
| | |
Eigen is now able to use triSYCL with EIGEN_SYCL_TRISYCL and TRISYCL_INCLUDE_DIR options
Fix contraction kernel with correct nd_item dimension
|
|/
|
|
|
|
|
|
|
|
| |
in MeanReducer.
Improves support for std::complex types when compiling for CUDA.
Expands on e2e9cdd16970914cf0a892fea5e7c4402b3ede41
and 2bda1b0d93fb627d0c500ec48b20302d44c32cb7
.
|
| |
|
|
|
|
| |
dims to be int in Argmax.
|
|\ |
|
| |
| |
| |
| | |
thread
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
cycles for high-latency use-cases.
* Adds a hint to ThreadPool allowing us to turn off spin waiting. Currently each reader and record yielder op in a graph creates a threadpool with a thread that spins for 1000 iterations through the work stealing loop before yielding. This is wasteful for such ops that process I/O.
* This also changes the number of iterations through the steal loop to be inversely proportional to the number of threads. Since the time of each iteration is proportional to the number of threads, this yields roughly a constant spin time.
* Implement a separate worker loop for the num_threads == 1 case since there is no point in going through the expensive steal loop. Moreover, since Steal() calls PopBack() on the victim queues it might reverse the order in which ops are executed, compared to the order in which they are scheduled, which is usually counter-productive for the types of I/O workloads the single thread pools tend to be used for.
* Store num_threads in a member variable for simplicity and to avoid a data race between the thread creation loop and worker threads calling threads_.size().
|
| |
| |
| |
| | |
sycl; fixing the address space issue for const TensorMap; converting all discard_write to write due to data missmatch.
|
| |
| |
| |
| | |
issue on sycl when the rhs is TensorContraction, reduction or convolution; Fixing the partial modification for memset when sycl backend is used.
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| | |
TensorInflation.h.
|
| | |
|
|/ |
|
|
|
|
| |
verification operation for cxx11_tensorChipping.cpp test
|
| |
|
| |
|