Commit message (Collapse) | Author | Age | ||
---|---|---|---|---|
... | ||||
* | Fix trivial shadow warning | Christoph Hertzberg | 2019-12-19 | |
| | ||||
* | Fix TensorPadding bug in squeezed reads from inner dimension | Eugene Zhulenev | 2019-12-19 | |
| | ||||
* | Return const data pointer from TensorRef evaluator.data() | Eugene Zhulenev | 2019-12-18 | |
| | ||||
* | Tensor block evaluation cost model | Eugene Zhulenev | 2019-12-18 | |
| | ||||
* | Reduce block evaluation overhead for small tensor expressions | Eugene Zhulenev | 2019-12-17 | |
| | ||||
* | Initialize non-trivially constructible types when allocating a temp buffer. | Eugene Zhulenev | 2019-12-12 | |
| | ||||
* | Squeeze reads from two inner dimensions in TensorPadding | Eugene Zhulenev | 2019-12-11 | |
| | ||||
* | Add back accidentally deleted default constructor to ↵ | Eugene Zhulenev | 2019-12-11 | |
| | | | | TensorExecutorTilingContext. | |||
* | Remove block memory allocation required by removed block evaluation API | Eugene Zhulenev | 2019-12-10 | |
| | ||||
* | Remove V2 suffix from TensorBlock | Eugene Zhulenev | 2019-12-10 | |
| | ||||
* | Remove TensorBlock.h and old TensorBlock/BlockMapper | Eugene Zhulenev | 2019-12-10 | |
| | ||||
* | Fix for HIP breakage detected on 191210 | Deven Desai | 2019-12-10 | |
| | | | | | | | | The following commit introduces compile errors when running eigen with hipcc https://gitlab.com/libeigen/eigen/commit/2918f85ba976dbfbf72f7d4c1961a577f5850148 hipcc errors out because it requies the device attribute on the methods within the TensorBlockV2ResourceRequirements struct instroduced by the commit above. The fix is to add the device attribute to those methods | |||
* | Do not use std::vector in getResourceRequirements | Eugene Zhulenev | 2019-12-09 | |
| | ||||
* | Undo the block size change. | Artem Belevich | 2019-12-09 | |
| | | | | .z *is* used by the EigenContractionKernelInternal(). | |||
* | Add async evaluation support to TensorSelectOp | Eugene Zhulenev | 2019-12-09 | |
| | ||||
* | Add recursive work splitting to EvalShardedByInnerDimContext | Eugene Zhulenev | 2019-12-05 | |
| | ||||
* | Improve performance of contraction kernels | Artem Belevich | 2019-12-05 | |
| | | | | | | | | | | * Force-inline implementations. They pass around pointers to shared memory blocks. Without inlining compiler must operate via generic pointers. Inlining allows compiler to detect that we're operating on shared memory which allows generation of substantially faster code. * Fixed a long-standing typo which resulted in launching 8x more kernels than we needed (.z dimension of the block is unused by the kernel). | |||
* | Capture TensorMap by value inside tensor expression AST | Eugene Zhulenev | 2019-12-03 | |
| | ||||
* | Remove __host__ annotation for device-only function. | Rasmus Munk Larsen | 2019-12-03 | |
| | ||||
* | Use EIGEN_DEVICE_FUNC macro instead of __device__. | Rasmus Munk Larsen | 2019-12-03 | |
| | ||||
* | [SYCL] Rebasing the SYCL support branch on top of the Einge upstream master ↵ | Mehdi Goli | 2019-11-28 | |
| | | | | | | | | | | | | | | | | | | | | | | branch. * Unifying all loadLocalTile from lhs and rhs to an extract_block function. * Adding get_tensor operation which was missing in TensorContractionMapper. * Adding the -D method missing from cmake for Disable_Skinny Contraction operation. * Wrapping all the indices in TensorScanSycl into Scan parameter struct. * Fixing typo in Device SYCL * Unifying load to private register for tall/skinny no shared * Unifying load to vector tile for tensor-vector/vector-tensor operation * Removing all the LHS/RHS class for extracting data from global * Removing Outputfunction from TensorContractionSkinnyNoshared. * Combining the local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel. * Combining General Tensor-Vector and VectorTensor contraction into one kernel. * Making double buffering optional for Tensor contraction when local memory is version is used. * Modifying benchmark to accept custom Reduction Sizes * Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host * Adding Test for SYCL * Modifying SYCL CMake | |||
* | Add async evaluation support to TensorReverse | Eugene Zhulenev | 2019-11-26 | |
| | ||||
* | Add async evaluation support to TensorPadding/TensorImagePatch/TensorShuffling | Eugene Zhulenev | 2019-11-26 | |
| | ||||
* | Remove legacy block evaluation support | Eugene Zhulenev | 2019-11-12 | |
| | ||||
* | Fix a race in async tensor evaluation: Don't run on_done() until after ↵ | Rasmus Munk Larsen | 2019-11-11 | |
| | | | | device.deallocate() / evaluator.cleanup() complete, since the device might be destroyed after on_done() runs. | |||
* | Break loop dependence in TensorGenerator block access | Eugene Zhulenev | 2019-11-11 | |
| | ||||
* | Add EIGEN_HAS_INTRINSIC_INT128 macro | Rasmus Munk Larsen | 2019-11-06 | |
| | | | | Add a new EIGEN_HAS_INTRINSIC_INT128 macro, and use this instead of __SIZEOF_INT128__. This fixes related issues with TensorIntDiv.h when building with Clang for Windows, where support for 128-bit integer arithmetic is advertised but broken in practice. | |||
* | Rollback or PR-746 and partial rollback of ↵ | Rasmus Munk Larsen | 2019-11-05 | |
| | | | | | | | | https://bitbucket.org/eigen/eigen/commits/668ab3fc474e54c7919eda4fbaf11f3a99246494 . std::array is still not supported in CUDA device code on Windows. | |||
* | Remove internal::smart_copy and replace with std::copy | Eugene Zhulenev | 2019-10-29 | |
| | ||||
* | Prevent potential ODR in TensorExecutor | Eugene Zhulenev | 2019-10-28 | |
| | ||||
* | Merged in deven-amd/eigen-hip-fix-191018 (pull request PR-738) | Rasmus Larsen | 2019-10-22 | |
|\ | | | | | | | Fix for the HIP build+test errors. | |||
* | | Add block evaluation V2 to TensorAsyncExecutor. | Rasmus Munk Larsen | 2019-10-22 | |
| | | | | | | | | Add async evaluation to a number of ops. | |||
| * | Fix for the HIP build+test errors. | Deven Desai | 2019-10-22 | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The errors were introduced by this commit : After the above mentioned commit, some of the tests started failing with the following error ``` Built target cxx11_tensor_reduction Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_reduction_gpu_5.dir/cxx11_tensor_reduction_gpu_5_generated_cxx11_tensor_reduction_gpu.cu.o In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16: In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:117: /home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlockV2.h:155:5: error: the field type is not amp-compatible DestinationBufferKind m_kind; ^ /home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlockV2.h:211:3: error: the field type is not amp-compatible DestinationBuffer m_destination; ^ ``` For some reason HIPCC does not like device code to contain enum types which do not have the base-type explicitly declared. The fix is trivial, explicitly state "int" as the basetype | |||
* | | Drop support for c++03 in Eigen tensor. Get rid of some code used to emulate ↵ | Rasmus Munk Larsen | 2019-10-18 | |
|/ | | | | c++11 functionality with older compilers. | |||
* | Propagate block evaluation preference through rvalue tensor expressions | Eugene Zhulenev | 2019-10-17 | |
| | ||||
* | Cleanup Tensor block destination and materialized block storage allocation | Eugene Zhulenev | 2019-10-16 | |
| | ||||
* | TensorBroadcasting support for random/uniform blocks | Eugene Zhulenev | 2019-10-16 | |
| | ||||
* | Block evaluation for TensorGenerator/TensorReverse/TensorShuffling | Eugene Zhulenev | 2019-10-14 | |
| | ||||
* | Block evaluation for TensorGenerator + TensorReverse + fixed bug in tensor ↵ | Eugene Zhulenev | 2019-10-10 | |
| | | | | reverse op | |||
* | Block evaluation for TensorChipping + fixed bugs in TensorPadding and ↵ | Eugene Zhulenev | 2019-10-09 | |
| | | | | TensorSlicing | |||
* | Add block evaluation to TensorEvalTo and fix few small bugs | Eugene Zhulenev | 2019-10-07 | |
| | ||||
* | Fixing incorrect size in Tensor documentation. | Brian Zhao | 2019-10-04 | |
| | ||||
* | Fix compilation warnings and errors with clang in TensorBlockV2 code and tests | Eugene Zhulenev | 2019-10-04 | |
| | ||||
* | Add block evaluation to TensorReshaping/TensorCasting/TensorPadding/TensorSelect | Eugene Zhulenev | 2019-10-02 | |
| | ||||
* | Add beta to TensorContractionKernel and make memset optional | Eugene Zhulenev | 2019-10-02 | |
| | ||||
* | Fix compilation warnings and errors with clang in TensorBlockV2 | Eugene Zhulenev | 2019-09-25 | |
| | ||||
* | Fix a bug in a packed block type in TensorContractionThreadPool | Eugene Zhulenev | 2019-09-24 | |
| | ||||
* | Choose TensorBlock StridedLinearCopy type statically | Eugene Zhulenev | 2019-09-24 | |
| | ||||
* | Add new TensorBlock api implementation + tests | Eugene Zhulenev | 2019-09-24 | |
| | ||||
* | Tensor block evaluation V2 support for unary/binary/broadcsting | Eugene Zhulenev | 2019-09-24 | |
| |