aboutsummaryrefslogtreecommitdiffhomepage
path: root/unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h
Commit message (Collapse)AuthorAge
* Allow custom TENSOR_CONTRACTION_DISPATCH macro.Gravatar Antonio Sanchez2021-06-11
| | | | | | Currently TF lite needs to hack around with the Tensor headers in order to customize the contraction dispatch method. Here we add simple `#ifndef` guards to allow them to provide their own dispatch prior to inclusion.
* Fix calls to device functions from host codeGravatar Nathan Luehr2021-05-11
|
* Inherit from `no_assignment_operator` to avoid implicit copy constructor ↵Gravatar Christoph Hertzberg2021-02-27
| | | | | | warnings (cherry picked from commit 9bbb7ea4b54b1f307863be4ed8d105c38cdefe50)
* Remove V2 suffix from TensorBlockGravatar Eugene Zhulenev2019-12-10
|
* Remove legacy block evaluation supportGravatar Eugene Zhulenev2019-11-12
|
* Add beta to TensorContractionKernel and make memset optionalGravatar Eugene Zhulenev2019-10-02
|
* Tensor block evaluation V2 support for unary/binary/broadcstingGravatar Eugene Zhulenev2019-09-24
|
* evalSubExprsIfNeededAsync + async TensorContractionThreadPoolGravatar Eugene Zhulenev2019-08-30
|
* Remove XSMM support from Tensor moduleGravatar Eugene Zhulenev2019-08-19
|
* [SYCL] This PR adds the minimum modifications to the Eigen unsupported ↵Gravatar Mehdi Goli2019-06-28
| | | | | | | | | | module required to run it on devices supporting SYCL. * Abstracting the pointer type so that both SYCL memory and pointer can be captured. * Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class. * Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node. * Adding SYCL macro for controlling loop unrolling. * Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
* Fix stupid shadow-warnings (with old clang versions)Gravatar Christoph Hertzberg2019-05-07
|
* Add missing EIGEN_DEPRECATED annotations to deprecated functions and fix few ↵Gravatar Eugene Zhulenev2019-04-23
| | | | other doxygen warnings
* adding EIGEN_DEVICE_FUNC to the recently added TensorContractionKernel ↵Gravatar Deven Desai2019-04-08
| | | | constructor. Not having the EIGEN_DEVICE_FUNC attribute on it was leading to compiler errors when compiling Eigen in the ROCm/HIP path
* Add missing semicolonGravatar Eugene Zhulenev2019-04-02
|
* Add support for custom packed Lhs/Rhs blocks in tensor contractionsGravatar Eugene Zhulenev2019-04-01
|
* Fix incorrect value of NumDimensions in TensorContraction traits.Gravatar Rasmus Munk Larsen2019-02-19
| | | | Reported here: #1671
* Make code compile in C++03 mode againGravatar Christoph Hertzberg2018-10-02
|
* Merged in deven-amd/eigen/HIP_fixes (pull request PR-518)Gravatar Christoph Hertzberg2018-10-01
|\ | | | | | | PR with HIP specific fixes (for the eigen nightly regression failures in HIP mode)
| * This commit contains the following (HIP specific) updates:Gravatar Deven Desai2018-10-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - unsupported/Eigen/CXX11/src/Tensor/TensorReductionGpu.h Changing "pass-by-reference" argument to be "pass-by-value" instead (in a __global__ function decl). "pass-by-reference" arguments to __global__ functions are unwise, and will be explicitly flagged as errors by the newer versions of HIP. - Eigen/src/Core/util/Memory.h - unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h Changes introduced in recent commits breaks the HIP compile. Adding EIGEN_DEVICE_FUNC attribute to some functions and calling ::malloc/free instead of the corresponding std:: versions to get the HIP compile working again - unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h Change introduced a recent commit breaks the HIP compile (link stage errors out due to failure to inline a function). Disabling the recently introduced code (only for HIP compile), to get the eigen nightly testing going again. Will submit another PR once we have te proper fix. - Eigen/src/Core/util/ConfigureVectorization.h Enabling GPU VECTOR support when HIP compiler is in use (for both the host and device compile phases)
* | Get rid of unused variable warning.Gravatar Rasmus Munk Larsen2018-09-28
| |
* | Add tests for evalShardedByInnerDim contraction + fix bugsGravatar Eugene Zhulenev2018-09-28
|/
* Revert code lost in mergeGravatar Eugene Zhulenev2018-09-27
|
* Merge with eigen/eigen defaultGravatar Eugene Zhulenev2018-09-27
|\
* | Remove explicit mkldnn support and redundant TensorContractionKernelBlockingGravatar Eugene Zhulenev2018-09-27
| |
| * Parallelize tensor contraction over the inner dimension in cases where where ↵Gravatar Rasmus Munk Larsen2018-09-26
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | one or both of the outer dimensions (m and n) are small but k is large. This speeds up individual matmul microbenchmarks by up to 85%. Naming below is BM_Matmul_M_K_N_THREADS, measured on a 2-socket Intel Broadwell-based server. Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------ BM_Matmul_1_80_13522_1 387457 396013 -2.2% BM_Matmul_1_80_13522_2 406487 230789 +43.2% BM_Matmul_1_80_13522_4 395821 123211 +68.9% BM_Matmul_1_80_13522_6 391625 97002 +75.2% BM_Matmul_1_80_13522_8 408986 113828 +72.2% BM_Matmul_1_80_13522_16 399988 67600 +83.1% BM_Matmul_1_80_13522_22 411546 60044 +85.4% BM_Matmul_1_80_13522_32 393528 57312 +85.4% BM_Matmul_1_80_13522_44 390047 63525 +83.7% BM_Matmul_1_80_13522_88 387876 63592 +83.6% BM_Matmul_1_1500_500_1 245359 248119 -1.1% BM_Matmul_1_1500_500_2 401833 143271 +64.3% BM_Matmul_1_1500_500_4 210519 100231 +52.4% BM_Matmul_1_1500_500_6 251582 86575 +65.6% BM_Matmul_1_1500_500_8 211499 80444 +62.0% BM_Matmul_3_250_512_1 70297 68551 +2.5% BM_Matmul_3_250_512_2 70141 52450 +25.2% BM_Matmul_3_250_512_4 67872 58204 +14.2% BM_Matmul_3_250_512_6 71378 63340 +11.3% BM_Matmul_3_250_512_8 69595 41652 +40.2% BM_Matmul_3_250_512_16 72055 42549 +40.9% BM_Matmul_3_250_512_22 70158 54023 +23.0% BM_Matmul_3_250_512_32 71541 56042 +21.7% BM_Matmul_3_250_512_44 71843 57019 +20.6% BM_Matmul_3_250_512_88 69951 54045 +22.7% BM_Matmul_3_1500_512_1 369328 374284 -1.4% BM_Matmul_3_1500_512_2 428656 223603 +47.8% BM_Matmul_3_1500_512_4 205599 139508 +32.1% BM_Matmul_3_1500_512_6 214278 139071 +35.1% BM_Matmul_3_1500_512_8 184149 142338 +22.7% BM_Matmul_3_1500_512_16 156462 156983 -0.3% BM_Matmul_3_1500_512_22 163905 158259 +3.4% BM_Matmul_3_1500_512_32 155314 157662 -1.5% BM_Matmul_3_1500_512_44 235434 158657 +32.6% BM_Matmul_3_1500_512_88 156779 160275 -2.2% BM_Matmul_1500_4_512_1 363358 349528 +3.8% BM_Matmul_1500_4_512_2 303134 263319 +13.1% BM_Matmul_1500_4_512_4 176208 130086 +26.2% BM_Matmul_1500_4_512_6 148026 115449 +22.0% BM_Matmul_1500_4_512_8 131656 98421 +25.2% BM_Matmul_1500_4_512_16 134011 82861 +38.2% BM_Matmul_1500_4_512_22 134950 85685 +36.5% BM_Matmul_1500_4_512_32 133165 90081 +32.4% BM_Matmul_1500_4_512_44 133203 90644 +32.0% BM_Matmul_1500_4_512_88 134106 100566 +25.0% BM_Matmul_4_1500_512_1 439243 435058 +1.0% BM_Matmul_4_1500_512_2 451830 257032 +43.1% BM_Matmul_4_1500_512_4 276434 164513 +40.5% BM_Matmul_4_1500_512_6 182542 144827 +20.7% BM_Matmul_4_1500_512_8 179411 166256 +7.3% BM_Matmul_4_1500_512_16 158101 155560 +1.6% BM_Matmul_4_1500_512_22 152435 155448 -1.9% BM_Matmul_4_1500_512_32 155150 149538 +3.6% BM_Matmul_4_1500_512_44 193842 149777 +22.7% BM_Matmul_4_1500_512_88 149544 154468 -3.3%
* Fix regression introduced by the previous fix for AVX512.Gravatar Gael Guennebaud2018-09-20
| | | | It brokes the complex-complex case on SSE.
* Fix gebp kernel for real+complex in case only reals are vectorized (e.g., ↵Gravatar Gael Guennebaud2018-09-20
| | | | | | AVX512). This commit also removes "half-packet" from data-mappers: it was not used and conceptually broken anyways.
* Merge with upstream eigen/defaultGravatar Eugene Zhulenev2018-08-27
|\
| * Fixed more sign-compare and type-limits warningsGravatar Christoph Hertzberg2018-08-24
| |
| * Use actual types instead of the auto keyword to make the code more portableGravatar Benoit Steiner2018-08-16
| |
| * Fixed the tensor contraction code.Gravatar Benoit Steiner2018-08-15
| |
| * Fixed compilation errors with gcc 4.7 and 4.8Gravatar Benoit Steiner2018-08-14
| |
* | Add block evaluationto CwiseUnaryOp and add PreferBlockAccess enum to all ↵Gravatar Eugene Zhulenev2018-08-10
|/ | | | evaluators
* Fix initialization order.Gravatar Rasmus Munk Larsen2018-08-03
|
* Enabling per device specialisation of packetsize.Gravatar Mehdi Goli2018-08-01
|
* Merged in ezhulenev/eigen/tiling_3 (pull request PR-438)Gravatar Gael Guennebaud2018-07-31
|\ | | | | | | Tiled tensor executor
| * Add tiled evaluation support to TensorExecutorGravatar Eugene Zhulenev2018-07-25
| |
* | Reduce the number of template specializations of classes related to tensor ↵Gravatar Rasmus Munk Larsen2018-07-27
|/ | | | contraction to reduce binary size.
* Specify default output kernel for TensorContractionOpGravatar Eugene Zhulenev2018-07-18
|
* applying EIGEN_DECLARE_TEST to *gpu* testsGravatar Deven Desai2018-07-17
| | | | | | | | | | | | | Also, a few minor fixes for GPU tests running in HIP mode. 1. Adding an include for hip/hip_runtime.h in the Macros.h file For HIP __host__ and __device__ are macros which are defined in hip headers. Their definitions need to be included before their use in the file. 2. Fixing the compile failure in TensorContractionGpu introduced by the commit to "Fuse computations into the Tensor contractions using output kernel" 3. Fixing a HIP/clang specific compile error by making the struct-member assignment explicit
* Call OutputKernel in evalGemvGravatar Eugene Zhulenev2018-07-12
|
* Fuse computations into the Tensor contractions using output kernelGravatar Eugene Zhulenev2018-07-10
|
* Adding support for using Eigen in HIP kernels.Gravatar Deven Desai2018-06-06
| | | | | | | | | This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs. Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor) Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.
* Fixed syntax errors generated by xcodeGravatar Benoit Steiner2017-07-09
|
* Merged in mehdi_goli/opencl/DataDependancy (pull request PR-10)Gravatar Benoit Steiner2017-06-28
| | | | | | | | | | DataDependancy * Wrapping data type to the pointer class for sycl in non-terminal nodes; not having that breaks Tensorflow Conv2d code. * Applying Ronnan's Comments. * Applying benoit's comments
* Pulled latest updates from upstreamGravatar Benoit Steiner2017-02-10
|\
| * Silenced several compilation warningsGravatar Benoit Steiner2017-02-01
| |
* | Fixing compiler error on TensorContractionSycl.h; Silencing the compiler ↵Gravatar Mehdi Goli2017-01-31
| | | | | | | | unused parameter warning for eval_op_indices in TensorContraction.h
* | Merge latest changes from upstreamGravatar Benoit Steiner2017-01-30
|\|
| * Revert PR-292. After further investigation, the memcpy->memmove change was ↵Gravatar Rasmus Munk Larsen2017-01-26
| | | | | | | | | | | | only good for Haswell on older versions of glibc. Adding a switch for small sizes is perhaps useful for string copies, but also has an overhead for larger sizes, making it a poor trade-off for general memcpy. This PR also removes a couple of unnecessary semi-colons in Eigen/src/Core/AssignEvaluator.h that caused compiler warning everywhere.