aboutsummaryrefslogtreecommitdiffhomepage
path: root/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h
Commit message (Collapse)AuthorAge
* Fail at compile time if default executor tries to use non-default deviceGravatar Eugene Zhulenev2020-02-06
|
* Tensor block evaluation cost modelGravatar Eugene Zhulenev2019-12-18
|
* Reduce block evaluation overhead for small tensor expressionsGravatar Eugene Zhulenev2019-12-17
|
* Add back accidentally deleted default constructor to ↵Gravatar Eugene Zhulenev2019-12-11
| | | | TensorExecutorTilingContext.
* Remove block memory allocation required by removed block evaluation APIGravatar Eugene Zhulenev2019-12-10
|
* Remove V2 suffix from TensorBlockGravatar Eugene Zhulenev2019-12-10
|
* Remove TensorBlock.h and old TensorBlock/BlockMapperGravatar Eugene Zhulenev2019-12-10
|
* Do not use std::vector in getResourceRequirementsGravatar Eugene Zhulenev2019-12-09
|
* Use EIGEN_DEVICE_FUNC macro instead of __device__.Gravatar Rasmus Munk Larsen2019-12-03
|
* [SYCL] Rebasing the SYCL support branch on top of the Einge upstream master ↵Gravatar Mehdi Goli2019-11-28
| | | | | | | | | | | | | | | | | | | | | | branch. * Unifying all loadLocalTile from lhs and rhs to an extract_block function. * Adding get_tensor operation which was missing in TensorContractionMapper. * Adding the -D method missing from cmake for Disable_Skinny Contraction operation. * Wrapping all the indices in TensorScanSycl into Scan parameter struct. * Fixing typo in Device SYCL * Unifying load to private register for tall/skinny no shared * Unifying load to vector tile for tensor-vector/vector-tensor operation * Removing all the LHS/RHS class for extracting data from global * Removing Outputfunction from TensorContractionSkinnyNoshared. * Combining the local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining General Tensor-Vector and VectorTensor contraction into one kernel. * Making double buffering optional for Tensor contraction when local memory is version is used. * Modifying benchmark to accept custom Reduction Sizes * Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host * Adding Test for SYCL * Modifying SYCL CMake
* Remove legacy block evaluation supportGravatar Eugene Zhulenev2019-11-12
|
* Fix a race in async tensor evaluation: Don't run on_done() until after ↵Gravatar Rasmus Munk Larsen2019-11-11
| | | | device.deallocate() / evaluator.cleanup() complete, since the device might be destroyed after on_done() runs.
* Prevent potential ODR in TensorExecutorGravatar Eugene Zhulenev2019-10-28
|
* Add block evaluation V2 to TensorAsyncExecutor.Gravatar Rasmus Munk Larsen2019-10-22
| | | | Add async evaluation to a number of ops.
* Drop support for c++03 in Eigen tensor. Get rid of some code used to emulate ↵Gravatar Rasmus Munk Larsen2019-10-18
| | | | c++11 functionality with older compilers.
* Block evaluation for TensorGenerator/TensorReverse/TensorShufflingGravatar Eugene Zhulenev2019-10-14
|
* Block evaluation for TensorChipping + fixed bugs in TensorPadding and ↵Gravatar Eugene Zhulenev2019-10-09
| | | | TensorSlicing
* Add block evaluation to TensorEvalTo and fix few small bugsGravatar Eugene Zhulenev2019-10-07
|
* Tensor block evaluation V2 support for unary/binary/broadcstingGravatar Eugene Zhulenev2019-09-24
|
* Allow move-only done callback in TensorAsyncDeviceGravatar Eugene Zhulenev2019-09-03
|
* Fix block mapper type name in TensorExecutorGravatar Eugene Zhulenev2019-08-30
|
* evalSubExprsIfNeededAsync + async TensorContractionThreadPoolGravatar Eugene Zhulenev2019-08-30
|
* Asynchronous expression evaluation with TensorAsyncDeviceGravatar Eugene Zhulenev2019-08-30
|
* Allocate non-const scalar buffer for block evaluation with DefaultDeviceGravatar Eugene Zhulenev2019-07-01
|
* Merge with Eigen headGravatar Eugene Zhulenev2019-06-28
|\
* | Add block access to TensorReverseOp and make sure that TensorForcedEval uses ↵Gravatar Eugene Zhulenev2019-06-28
| | | | | | | | block access when preferred
| * [SYCL] This PR adds the minimum modifications to the Eigen unsupported ↵Gravatar Mehdi Goli2019-06-28
|/ | | | | | | | | | module required to run it on devices supporting SYCL. * Abstracting the pointer type so that both SYCL memory and pointer can be captured. * Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class. * Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node. * Adding SYCL macro for controlling loop unrolling. * Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
* Prevent potential division by zero in TensorExecutorGravatar Eugene Zhulenev2019-05-17
|
* Always evaluate Tensor expressions with broadcasting via tiled evaluation ↵Gravatar Eugene Zhulenev2019-05-16
| | | | code path
* Fix segfaults with cuda compilationGravatar Eugene Zhulenev2019-03-11
|
* Fix placement of "#if defined(EIGEN_GPUCC)" guard region.Gravatar Rasmus Munk Larsen2019-03-06
| | | | | | Found with -Wundefined-func-template. Author: tkoeppe@google.com
* Fiw shadowing of last and allGravatar Gael Guennebaud2018-09-21
|
* Explicitly construct tensor block dimensions from evaluator dimensionsGravatar Eugene Zhulenev2018-09-14
|
* Merge with upstream eigen/defaultGravatar Eugene Zhulenev2018-08-27
|\
| * Fix several integer conversion and sign-compare warningsGravatar Christoph Hertzberg2018-08-24
| |
| * Removed an used variable (PacketSize) from TensorExecutorGravatar Sameer Agarwal2018-08-15
| |
| * Fixed more compilation errorsGravatar Benoit Steiner2018-08-15
| |
* | Merge with eigen/defaultGravatar Eugene Zhulenev2018-08-10
|\ \
| | * Code cleanupGravatar Benoit Steiner2018-08-13
| | |
| | * Use NULL instead of nullptr to avoid adding a cxx11 requirement.Gravatar Benoit Steiner2018-08-13
| |/
| * Avoided language features that are only available in cxx11 mode.Gravatar Benoit Steiner2018-08-10
| |
* | Fix bug in a test + compilation errorsGravatar Eugene Zhulenev2018-08-09
| |
* | Replace all using declarations with typedefs in Tensor opsGravatar Eugene Zhulenev2018-08-01
|/
* Converting ad-hoc inline keyword to EIGEN_STRONG_INLINE MACRO.Gravatar Mehdi Goli2018-08-01
|
* Rename Index to StorageIndex + use Eigen::Array and Eigen::Map when possibleGravatar Eugene Zhulenev2018-07-27
|
* Add tiled evaluation support to TensorExecutorGravatar Eugene Zhulenev2018-07-25
|
* Remove SimpleThreadPool and always use {NonBlocking}ThreadPoolGravatar Eugene Zhulenev2018-07-16
|
* merging the CUDA and HIP implementation for the Tensor directory and the ↵Gravatar Deven Desai2018-06-20
| | | | unit tests
* updates based on PR feedbackGravatar Deven Desai2018-06-14
| | | | | | | | | | | | | | | | | There are two major changes (and a few minor ones which are not listed here...see PR discussion for details) 1. Eigen::half implementations for HIP and CUDA have been merged. This means that - `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h` - `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h` - `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h` After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install. 2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate. - `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC || EIGEN_HIPCC)` - `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH || EIGEN_HIP_DEVICE_COMPILE)` - `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`
* Adding support for using Eigen in HIP kernels.Gravatar Deven Desai2018-06-06
| | | | | | | | | This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs. Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor) Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.