| Commit message (Collapse) | Author | Age |
... | |
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
request PR-448)
Adding new arch/SYCL headers, used for SYCL vectorization.
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
request PR-449)
Enabling per device specialisation of packetSize.
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
(pull request PR-445)
Replacing ad-hoc inline keyword with EIGEN_STRONG_INLINE MACRO.
|
| | | | | | | |
|
| |/ / / / /
|/| | | | | |
|
| |/ / / /
|/| | | | |
|
| |_|/ /
|/| | | |
|
| |/ /
|/| |
| | |
| | | |
user memory allocation/deallocation.
|
|/ / |
|
|\ \
| | |
| | |
| | | |
Use device's allocate function instead of internal::aligned_malloc.
|
| | | |
|
|\ \ \
| | | |
| | | |
| | | | |
Tiled tensor executor
|
| | | |
| | | |
| | | |
| | | | |
PR 437.
|
| | | |
| | | |
| | | |
| | | | |
long
|
| | | | |
|
| | | | |
|
| | | | |
|
| | | | |
|
|\ \ \ \
| | | | |
| | | | |
| | | | | |
Reduce the number of template specializations of classes related to tensor contraction to reduce binary size.
|
| | | | | |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This commit re-enables the use of FMA for the FAST sqrt functions.
Doing so improves the performance of both algorithms. The float32
version is now 88% the speed of the original function, while the
double version is 90%.
|
| | | | |
| | | | |
| | | | |
| | | | | |
contraction to reduce binary size.
|
| | | | | |
|
| | | | | |
|
| | | | | |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When supplied, this allocator will be used in place of
internal::aligned_malloc. This permits e.g. use of a NUMA-node specific
allocator where the thread-pool is also restricted a single NUMA-node.
|
|/ / / /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This commit fixes the AVX512 implementations of psqrt in the same
way that 3ed67cb0bb4af65fbf243df598604a8c7630bf7d
fixed the AVX2 version of this function. The
AVX512 versions of psqrt incorrectly return -0.0 for negative
values, instead of NaN. Fixing the issues requires adding
some additional instructions that slow down the algorithms. A
similar test to the one used in 3ed67cb0bb4af65fbf243df598604a8c7630bf7d
shows that the
corrected Packet16f code runs at 73% of the speed of the existing code,
while the corrected Packed8d function runs at 68% of the original.
|
| | | | |
|
| | | |
| | | |
| | | |
| | | | |
warnings in that case as well
|
| | | | |
|
| | | | |
|
| | | | |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
"... always use {NonBlocking}ThreadPool". It seems the non-blocking
implementation was me the default/only one, but a reference to the old
name was left unmodified. Fix that.
|
| | | | |
|
|/ / / |
|
| | | |
|
|\ \ \
| | | |
| | | |
| | | | |
Move cxx11_tensor_uint128 test under an EIGEN_TEST_CXX11 guarded block
|
| | | | |
|
| | | | |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
block
Builds configured without the -DEIGEN_TEST_CXX11=ON flag would fail
right away without this, as this test seems to rely on those language
features. The skip under compilation with MSVC was kept.
|
|/ / / |
|
| | | |
|
| | |
| | |
| | |
| | | |
it takes several hours to build.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Without explicit conversion Tensorflow fails to compile, pset1 template deduction fails.
cannot convert '((const Eigen::internal::MeanReducer<Eigen::half>*)this)
->Eigen::internal::MeanReducer<Eigen::half>::packetCount_'
(type 'const DenseIndex {aka const long int}')
to type 'const type& {aka const Eigen::half&}'
return pdiv(vaccum, pset1<Packet>(packetCount_));
Honestly I’m not sure why it works in Eigen tests, because Eigen::half constructor is explicit, and why it stopped working in TF, I didn’t find any relevant changes since previous Eigen upgrade.
static_cast<T>(packetCount_) - breaks cxx11_tensor_reductions test for Eigen::half, also quite surprising.
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|