aboutsummaryrefslogtreecommitdiffhomepage
Commit message (Collapse)AuthorAge
* Correcting the position of allocate_temp/deallocate_temp in TensorDeviceGpu.hGravatar Mehdi Goli2018-08-01
|
* Distinguishing between internal memory allocation/deallocation from explicit ↵Gravatar Mehdi Goli2018-08-01
| | | | user memory allocation/deallocation.
* Merged in yuefengz/eigen (pull request PR-370)Gravatar Benoit Steiner2018-07-31
|\ | | | | | | Use device's allocate function instead of internal::aligned_malloc.
* \ Merged in ezhulenev/eigen/tiling_3 (pull request PR-438)Gravatar Gael Guennebaud2018-07-31
|\ \ | | | | | | | | | Tiled tensor executor
* | | Speedup trivial tensor broadcasting on GPU by enforcing unaligned loads. See ↵Gravatar Gael Guennebaud2018-07-31
| | | | | | | | | | | | PR 437.
* | | bug #1577: fix msvc compilation of unit test, msvc defines ptrdiff_t as long ↵Gravatar Gael Guennebaud2018-07-30
| | | | | | | | | | | | long
| * | Rename Index to StorageIndex + use Eigen::Array and Eigen::Map when possibleGravatar Eugene Zhulenev2018-07-27
| | |
| * | Add tiled evaluation support to TensorExecutorGravatar Eugene Zhulenev2018-07-25
| | |
* | | bug #1578: Improve prefetching in matrix multiplication on MIPS.Gravatar Alexey Frunze2018-07-24
| | |
* | | Fix two small typos in the documentationGravatar Patrik Huber2018-07-26
| | |
* | | Merged in rmlarsen/eigen1 (pull request PR-441)Gravatar Gael Guennebaud2018-07-30
|\ \ \ | | | | | | | | | | | | Reduce the number of template specializations of classes related to tensor contraction to reduce binary size.
* | | | Re-enable FMA for fast sqrt functionsGravatar Mark D Ryan2018-07-30
| | | |
* | | | Re-enable FMA for fast sqrt functionsGravatar Mark D Ryan2018-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit re-enables the use of FMA for the FAST sqrt functions. Doing so improves the performance of both algorithms. The float32 version is now 88% the speed of the original function, while the double version is 90%.
| * | | Reduce the number of template specializations of classes related to tensor ↵Gravatar Rasmus Munk Larsen2018-07-27
| | | | | | | | | | | | | | | | contraction to reduce binary size.
| | * | TensorBlockIOGravatar Eugene Zhulenev2018-07-23
| | | |
* | | | Fix AVX512 implementations of psqrtGravatar Mark D Ryan2018-06-25
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit fixes the AVX512 implementations of psqrt in the same way that 3ed67cb0bb4af65fbf243df598604a8c7630bf7d fixed the AVX2 version of this function. The AVX512 versions of psqrt incorrectly return -0.0 for negative values, instead of NaN. Fixing the issues requires adding some additional instructions that slow down the algorithms. A similar test to the one used in 3ed67cb0bb4af65fbf243df598604a8c7630bf7d shows that the corrected Packet16f code runs at 73% of the speed of the existing code, while the corrected Packed8d function runs at 68% of the original.
* | | Add pcast packet op for NEON.Gravatar Rasmus Munk Larsen2018-07-26
| | |
* | | DIsable static assertions only when necessary and disable double-promotion ↵Gravatar Christoph Hertzberg2018-07-26
| | | | | | | | | | | | warnings in that case as well
* | | fix warnings for doc-eigen-prerequisitesGravatar Christoph Hertzberg2018-07-24
| | |
* | | Removed several shadowing types and use global Index typedef everywhereGravatar Christoph Hertzberg2018-07-25
| | |
* | | Rename variable which shadows class nameGravatar Christoph Hertzberg2018-07-25
| | |
* | | Account for missing change on commit "Remove SimpleThreadPool and..."Gravatar Gustavo Lima Chaves2018-07-23
| | | | | | | | | | | | | | | | | | "... always use {NonBlocking}ThreadPool". It seems the non-blocking implementation was me the default/only one, but a reference to the old name was left unmodified. Fix that.
* | | Fixed issue which made documentation not getting built anymoreGravatar Christoph Hertzberg2018-07-24
| | |
* | | Allow to filter out build-error messagesGravatar Christoph Hertzberg2018-07-24
|/ /
* | Initial support of TensorBlockGravatar Eugene Zhulenev2018-07-20
| |
* | Merged in glchaves/eigen (pull request PR-433)Gravatar Gael Guennebaud2018-07-23
|\ \ | | | | | | | | | Move cxx11_tensor_uint128 test under an EIGEN_TEST_CXX11 guarded block
* | | fix typoGravatar Gael Guennebaud2018-07-23
| | |
* | | Add lastN shorcuts to seq/seqN.Gravatar Gael Guennebaud2018-07-23
| | |
| * | Move cxx11_tensor_uint128 test under an EIGEN_TEST_CXX11 guardedGravatar Gustavo Lima Chaves2018-07-20
| | | | | | | | | | | | | | | | | | | | | | | | block Builds configured without the -DEIGEN_TEST_CXX11=ON flag would fail right away without this, as this test seems to rely on those language features. The skip under compilation with MSVC was kept.
* | | Disable type traits for stdlibc++ <= 4.9.3Gravatar Eugene Zhulenev2018-07-20
|/ /
* | Oopps, EIGEN_COMP_MSVC is not available before including Eigen.Gravatar Gael Guennebaud2018-07-20
| |
* | Disable optimization for sparse_product unit test with MSVC 2013, otherwise ↵Gravatar Gael Guennebaud2018-07-20
| | | | | | | | it takes several hours to build.
* | PR430: Convert count to the reducer type in MeanReducerGravatar Eugene Zhulenev2018-07-19
| | | | | | | | | | | | | | | | | | | | | | | | | | Without explicit conversion Tensorflow fails to compile, pset1 template deduction fails. cannot convert '((const Eigen::internal::MeanReducer<Eigen::half>*)this) ->Eigen::internal::MeanReducer<Eigen::half>::packetCount_' (type 'const DenseIndex {aka const long int}') to type 'const type& {aka const Eigen::half&}' return pdiv(vaccum, pset1<Packet>(packetCount_)); Honestly I’m not sure why it works in Eigen tests, because Eigen::half constructor is explicit, and why it stopped working in TF, I didn’t find any relevant changes since previous Eigen upgrade. static_cast<T>(packetCount_) - breaks cxx11_tensor_reductions test for Eigen::half, also quite surprising.
* | Pass by const ref.Gravatar Gael Guennebaud2018-07-19
| |
* | Fix IsRelocatable without C++11Gravatar Gael Guennebaud2018-07-19
| |
* | Fix determination of EIGEN_HAS_TYPE_TRAITSGravatar Gael Guennebaud2018-07-19
| |
* | Fix stupid error in Quaternion move ctorGravatar Gael Guennebaud2018-07-19
| |
* | bug #1558: fix a corner case in MINRES when both v_new and w_new vanish.Gravatar David Hyde2018-07-08
| |
* | Reduce number of allocations in TensorContractionThreadPool.Gravatar Eugene Zhulenev2018-07-16
| |
* | bug #1569: fix Tensor<half>::mean() on AVX with respective unit test.Gravatar Gael Guennebaud2018-07-19
| |
* | Add MIPS changes missing from previous merge.Gravatar Alexey Frunze2018-07-18
| |
* | Assert that no output kernel is defined for GPU contractionGravatar Eugene Zhulenev2018-07-18
| |
* | Disable type traits for GCC < 5.1.0Gravatar Eugene Zhulenev2018-07-18
| |
* | Specify default output kernel for TensorContractionOpGravatar Eugene Zhulenev2018-07-18
| |
* | Add regression for bugs #1573 and #1575Gravatar Gael Guennebaud2018-07-18
| |
* | bug #1432: fix conservativeResize for non-relocatable scalar types. For ↵Gravatar Gael Guennebaud2018-07-18
| | | | | | | | those we need to by-pass realloc routines and fall-back to allocate as new - copy - delete. The remaining problem is that we don't have any mechanism to accurately determine whether a type is relocatable or not, so currently let's be super conservative using either RequireInitialization or std::is_trivially_copyable
* | Generalize ScalarWithExceptions to a full non-copyable and trowing scalar ↵Gravatar Gael Guennebaud2018-07-18
| | | | | | | | type to be used in other unit tests.
* | bug #1575: fix regression introduced in bug #1573 patch. Move ↵Gravatar Gael Guennebaud2018-07-18
| | | | | | | | ctor/assignment should not be defaulted.
* | More clearly disable the inclusion of src/Core/arch/CUDA/Complex.h without CUDAGravatar Gael Guennebaud2018-07-18
| |
| * Use device's allocate function instead of internal::aligned_malloc. This ↵Gravatar Yuefeng Zhou2018-02-20
| | | | | | | | would make it easier to track memory usage in device instances.