| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
| |
* Adding Missing operations for vector comparison in SYCL. This caused compiler error for vector comparison when compiling SYCL
* Fixing the compiler error for placement new in TensorForcedEval.h This caused compiler error when compiling SYCL backend
* Reducing the SYCL warning by removing the abort function inside the kernel
* Adding Strong inline to functions inside SYCL interop.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Add async evaluation to a number of ops.
|
| |
|
|
|
|
| |
TensorSlicing
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://bitbucket.org/eigen/eigen/pull-requests/662.
The change caused the device struct to be copied for each expression evaluation, and caused, e.g., a 10% regression in the TensorFlow multinomial op on GPU:
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------------------
BM_Multinomial_gpu_1_100000_4 128173 231326 2922 1.610G items/s
VS
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------------------
BM_Multinomial_gpu_1_100000_4 146683 246914 2719 1.509G items/s
|
|\ |
|
| |
| |
| |
| | |
block access when preferred
|
|/
|
|
|
|
|
|
|
|
| |
module required to run it on devices supporting SYCL.
* Abstracting the pointer type so that both SYCL memory and pointer can be captured.
* Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class.
* Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node.
* Adding SYCL macro for controlling loop unrolling.
* Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
|
| |
|
|
|
|
|
|
|
| |
unsupported modules (by using the corresponding Doxytags file).
Manually grafted from d107a371c61b764c73fd1570b1f3ed1c6400dd7e
|
|
|
|
| |
evaluators
|
| |
|
|\
| |
| |
| |
| |
| | |
codeplaysoftware/eigen-upstream-pure/separating_internal_memory_allocation (pull request PR-446)
Distinguishing between internal memory allocation/deallocation from explicit user memory allocation/deallocation.
|
| | |
|
|/
|
|
| |
user memory allocation/deallocation.
|
| |
|
|
|
|
|
|
|
|
|
| |
This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs.
Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor)
Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.
|
|
|
|
|
|
|
|
|
|
| |
DataDependancy
* Wrapping data type to the pointer class for sycl in non-terminal nodes; not having that breaks Tensorflow Conv2d code.
* Applying Ronnan's Comments.
* Applying benoit's comments
|
|
|
|
| |
TensorConvolutionOp for ComputeCpp; fixing typos. modifying TensorDeviceSycl to use the LegacyPointer class.
|
|
|
|
| |
Tensor Contractsycl to be located in any place in the expression tree.
|
|
|
|
| |
duplication name error; adding tensorConcatinationOp backend for sycl.
|
| |
|
| |
|
|
|
|
| |
available on DSize.
|
| |
|
| |
|
| |
|
|
|
|
| |
estimate the cost of evaluating tensor expressions.
|
|
|
|
| |
All the vectorization is now defined in the tensor evaluators. This will make it possible to relialably support devices with different packet types in the same compilation unit.
|
|
|
|
| |
TensorForcedEval::evalSubExprIfNeeded, as it will be done when executing the EvalTo subexpression
|
|
|
|
| |
the evaluation of an expression.
|
| |
|
| |
|
| |
|
|
|
|
| |
internal::is_arithmetic<T>::value to check whether it's possible to bypass the type constructor in the tensor code.
|
| |
|
| |
|
| |
|
|
|
|
| |
previously disabled by mistake
|
| |
|
|
|
|
| |
Misc fixes and API cleanups.
|
|
|
|
| |
More tests
|
|
efficiently compute convolutions and contractions in the future:
* The scheduling of computation is moved out the the assignment code and into a new TensorExecutor class
* The assignment itself is now a regular node on the expression tree
* The expression evaluators start by recursively evaluating all their subexpressions if needed
|