Commit message (Collapse) | Author | Age | |
---|---|---|---|
* | Correcting the position of allocate_temp/deallocate_temp in TensorDeviceGpu.h | 2018-08-01 | |
| | |||
* | Distinguishing between internal memory allocation/deallocation from explicit ↵ | 2018-08-01 | |
| | | | | user memory allocation/deallocation. | ||
* | Merged in yuefengz/eigen (pull request PR-370) | 2018-07-31 | |
|\ | | | | | | | Use device's allocate function instead of internal::aligned_malloc. | ||
* \ | Merged in ezhulenev/eigen/tiling_3 (pull request PR-438) | 2018-07-31 | |
|\ \ | | | | | | | | | | Tiled tensor executor | ||
* | | | Speedup trivial tensor broadcasting on GPU by enforcing unaligned loads. See ↵ | 2018-07-31 | |
| | | | | | | | | | | | | PR 437. | ||
* | | | bug #1577: fix msvc compilation of unit test, msvc defines ptrdiff_t as long ↵ | 2018-07-30 | |
| | | | | | | | | | | | | long | ||
| * | | Rename Index to StorageIndex + use Eigen::Array and Eigen::Map when possible | 2018-07-27 | |
| | | | |||
| * | | Add tiled evaluation support to TensorExecutor | 2018-07-25 | |
| | | | |||
* | | | bug #1578: Improve prefetching in matrix multiplication on MIPS. | 2018-07-24 | |
| | | | |||
* | | | Fix two small typos in the documentation | 2018-07-26 | |
| | | | |||
* | | | Merged in rmlarsen/eigen1 (pull request PR-441) | 2018-07-30 | |
|\ \ \ | | | | | | | | | | | | | Reduce the number of template specializations of classes related to tensor contraction to reduce binary size. | ||
* | | | | Re-enable FMA for fast sqrt functions | 2018-07-30 | |
| | | | | |||
* | | | | Re-enable FMA for fast sqrt functions | 2018-07-30 | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit re-enables the use of FMA for the FAST sqrt functions. Doing so improves the performance of both algorithms. The float32 version is now 88% the speed of the original function, while the double version is 90%. | ||
| * | | | Reduce the number of template specializations of classes related to tensor ↵ | 2018-07-27 | |
| | | | | | | | | | | | | | | | | contraction to reduce binary size. | ||
| | * | | TensorBlockIO | 2018-07-23 | |
| | | | | |||
* | | | | Fix AVX512 implementations of psqrt | 2018-06-25 | |
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit fixes the AVX512 implementations of psqrt in the same way that 3ed67cb0bb4af65fbf243df598604a8c7630bf7d fixed the AVX2 version of this function. The AVX512 versions of psqrt incorrectly return -0.0 for negative values, instead of NaN. Fixing the issues requires adding some additional instructions that slow down the algorithms. A similar test to the one used in 3ed67cb0bb4af65fbf243df598604a8c7630bf7d shows that the corrected Packet16f code runs at 73% of the speed of the existing code, while the corrected Packed8d function runs at 68% of the original. | ||
* | | | Add pcast packet op for NEON. | 2018-07-26 | |
| | | | |||
* | | | DIsable static assertions only when necessary and disable double-promotion ↵ | 2018-07-26 | |
| | | | | | | | | | | | | warnings in that case as well | ||
* | | | fix warnings for doc-eigen-prerequisites | 2018-07-24 | |
| | | | |||
* | | | Removed several shadowing types and use global Index typedef everywhere | 2018-07-25 | |
| | | | |||
* | | | Rename variable which shadows class name | 2018-07-25 | |
| | | | |||
* | | | Account for missing change on commit "Remove SimpleThreadPool and..." | 2018-07-23 | |
| | | | | | | | | | | | | | | | | | | "... always use {NonBlocking}ThreadPool". It seems the non-blocking implementation was me the default/only one, but a reference to the old name was left unmodified. Fix that. | ||
* | | | Fixed issue which made documentation not getting built anymore | 2018-07-24 | |
| | | | |||
* | | | Allow to filter out build-error messages | 2018-07-24 | |
|/ / | |||
* | | Initial support of TensorBlock | 2018-07-20 | |
| | | |||
* | | Merged in glchaves/eigen (pull request PR-433) | 2018-07-23 | |
|\ \ | | | | | | | | | | Move cxx11_tensor_uint128 test under an EIGEN_TEST_CXX11 guarded block | ||
* | | | fix typo | 2018-07-23 | |
| | | | |||
* | | | Add lastN shorcuts to seq/seqN. | 2018-07-23 | |
| | | | |||
| * | | Move cxx11_tensor_uint128 test under an EIGEN_TEST_CXX11 guarded | 2018-07-20 | |
| | | | | | | | | | | | | | | | | | | | | | | | | block Builds configured without the -DEIGEN_TEST_CXX11=ON flag would fail right away without this, as this test seems to rely on those language features. The skip under compilation with MSVC was kept. | ||
* | | | Disable type traits for stdlibc++ <= 4.9.3 | 2018-07-20 | |
|/ / | |||
* | | Oopps, EIGEN_COMP_MSVC is not available before including Eigen. | 2018-07-20 | |
| | | |||
* | | Disable optimization for sparse_product unit test with MSVC 2013, otherwise ↵ | 2018-07-20 | |
| | | | | | | | | it takes several hours to build. | ||
* | | PR430: Convert count to the reducer type in MeanReducer | 2018-07-19 | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | Without explicit conversion Tensorflow fails to compile, pset1 template deduction fails. cannot convert '((const Eigen::internal::MeanReducer<Eigen::half>*)this) ->Eigen::internal::MeanReducer<Eigen::half>::packetCount_' (type 'const DenseIndex {aka const long int}') to type 'const type& {aka const Eigen::half&}' return pdiv(vaccum, pset1<Packet>(packetCount_)); Honestly I’m not sure why it works in Eigen tests, because Eigen::half constructor is explicit, and why it stopped working in TF, I didn’t find any relevant changes since previous Eigen upgrade. static_cast<T>(packetCount_) - breaks cxx11_tensor_reductions test for Eigen::half, also quite surprising. | ||
* | | Pass by const ref. | 2018-07-19 | |
| | | |||
* | | Fix IsRelocatable without C++11 | 2018-07-19 | |
| | | |||
* | | Fix determination of EIGEN_HAS_TYPE_TRAITS | 2018-07-19 | |
| | | |||
* | | Fix stupid error in Quaternion move ctor | 2018-07-19 | |
| | | |||
* | | bug #1558: fix a corner case in MINRES when both v_new and w_new vanish. | 2018-07-08 | |
| | | |||
* | | Reduce number of allocations in TensorContractionThreadPool. | 2018-07-16 | |
| | | |||
* | | bug #1569: fix Tensor<half>::mean() on AVX with respective unit test. | 2018-07-19 | |
| | | |||
* | | Add MIPS changes missing from previous merge. | 2018-07-18 | |
| | | |||
* | | Assert that no output kernel is defined for GPU contraction | 2018-07-18 | |
| | | |||
* | | Disable type traits for GCC < 5.1.0 | 2018-07-18 | |
| | | |||
* | | Specify default output kernel for TensorContractionOp | 2018-07-18 | |
| | | |||
* | | Add regression for bugs #1573 and #1575 | 2018-07-18 | |
| | | |||
* | | bug #1432: fix conservativeResize for non-relocatable scalar types. For ↵ | 2018-07-18 | |
| | | | | | | | | those we need to by-pass realloc routines and fall-back to allocate as new - copy - delete. The remaining problem is that we don't have any mechanism to accurately determine whether a type is relocatable or not, so currently let's be super conservative using either RequireInitialization or std::is_trivially_copyable | ||
* | | Generalize ScalarWithExceptions to a full non-copyable and trowing scalar ↵ | 2018-07-18 | |
| | | | | | | | | type to be used in other unit tests. | ||
* | | bug #1575: fix regression introduced in bug #1573 patch. Move ↵ | 2018-07-18 | |
| | | | | | | | | ctor/assignment should not be defaulted. | ||
* | | More clearly disable the inclusion of src/Core/arch/CUDA/Complex.h without CUDA | 2018-07-18 | |
| | | |||
| * | Use device's allocate function instead of internal::aligned_malloc. This ↵ | 2018-02-20 | |
| | | | | | | | | would make it easier to track memory usage in device instances. |