| Commit message (Collapse) | Author | Age |
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
- cleanup noise in imaginary part of real roots
- take into account the magnitude of the derivative to check roots.
- use <= instead of < at appropriate places
|
|\
| |
| |
| |
| |
| | |
Fix tensor contraction on AVX512 builds
Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
evalShardedByInnerDim ensures that the values it passes for start_k and
end_k to evalGemmPartialWithoutOutputKernel are multiples of 8 as the kernel
does not work correctly when the values of k are not multiples of the
packet_size. While this precaution works for AVX builds, it is insufficient
for AVX512 builds where the maximum packet size is 16. The result is slightly
incorrect float32 contractions on AVX512 builds.
This commit fixes the problem by ensuring that k is always a multiple of
the packet_size if the packet_size is > 8.
|
| | |
|
|\ \
| | |
| | |
| | |
| | |
| | | |
Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth.
Approved-by: Eugene Zhulenev <ezhulenev@google.com>
|
| | | |
|
| | |
| | |
| | |
| | | |
of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth.
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
From 68d431b4c14ad60a778ee93c1f59ecc4b931950e Mon Sep 17 00:00:00 2001
Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` where the whitelists consists of:
```
als
ans
cas
dum
lastr
lowd
nd
overfl
pres
preverse
substraction
te
uint
whch
```
---
CMakeLists.txt | 26 +++++++++----------
Eigen/src/Core/GenericPacketMath.h | 2 +-
Eigen/src/SparseLU/SparseLU.h | 2 +-
bench/bench_norm.cpp | 2 +-
doc/HiPerformance.dox | 2 +-
doc/QuickStartGuide.dox | 2 +-
.../Eigen/CXX11/src/Tensor/TensorChipping.h | 6 ++---
.../Eigen/CXX11/src/Tensor/TensorDeviceGpu.h | 2 +-
.../src/Tensor/TensorForwardDeclarations.h | 4 +--
.../src/Tensor/TensorGpuHipCudaDefines.h | 2 +-
.../Eigen/CXX11/src/Tensor/TensorReduction.h | 2 +-
.../CXX11/src/Tensor/TensorReductionGpu.h | 2 +-
.../test/cxx11_tensor_concatenation.cpp | 2 +-
unsupported/test/cxx11_tensor_executor.cpp | 2 +-
14 files changed, 29 insertions(+), 29 deletions(-)
|
|/
|
|
|
|
|
|
|
| |
This patch modifies the TensorContraction class to ensure that the kc_ field is
always a multiple of the packet_size, if the packet_size is > 8. Without this
change spatial convolutions in Tensorflow do not work properly as the code that
re-arranges the input matrices can assert if kc_ is not a multiple of the
packet_size. This leads to a unit test failure,
//tensorflow/python/kernel_tests:conv_ops_test, on AVX512 builds of tensorflow.
|
|
|
|
| |
code, and b) supporting matrix exponential on platforms with 113 bits of mantissa for long doubles.
|
|\
| |
| |
| | |
Fix cxx11_tensor_{block_access, reduction} tests
|
| | |
|
| |
| |
| |
| |
| |
| |
| | |
unsupported modules (by using the corresponding Doxytags file).
Manually grafted from d107a371c61b764c73fd1570b1f3ed1c6400dd7e
|
| |
| |
| |
| |
| |
| |
| | |
find_package(StandardMathLibrary). Also replace EIGEN_COMPILER_SUPPORT_CXX11 in favor of EIGEN_COMPILER_SUPPORT_CPP11.
Grafted manually from a4afa90d161faab385a77f0e2764fb13ff3b9484
|
|/ |
|
| |
|
| |
|
|\
| |
| |
| |
| |
| | |
[TensorBlockIO] Check if it's allowed to squeeze inner dimensions
Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
|
| | |
|
| |
| |
| |
| | |
m_info was not properly computed and the logic was repeated in several places.
|
|/ |
|
| |
|
|
|
|
| |
sparse_basic(DynamicSparseMatrix) does not compile at all anyways
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|\
| |
| |
| | |
PR with HIP specific fixes (for the eigen nightly regression failures in HIP mode)
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
- unsupported/Eigen/CXX11/src/Tensor/TensorReductionGpu.h
Changing "pass-by-reference" argument to be "pass-by-value" instead
(in a __global__ function decl).
"pass-by-reference" arguments to __global__ functions are unwise,
and will be explicitly flagged as errors by the newer versions of HIP.
- Eigen/src/Core/util/Memory.h
- unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h
Changes introduced in recent commits breaks the HIP compile.
Adding EIGEN_DEVICE_FUNC attribute to some functions and
calling ::malloc/free instead of the corresponding std:: versions
to get the HIP compile working again
- unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h
Change introduced a recent commit breaks the HIP compile
(link stage errors out due to failure to inline a function).
Disabling the recently introduced code (only for HIP compile), to get
the eigen nightly testing going again.
Will submit another PR once we have te proper fix.
- Eigen/src/Core/util/ConfigureVectorization.h
Enabling GPU VECTOR support when HIP compiler is in use
(for both the host and device compile phases)
|
|\ \ |
|
| | | |
|
|/ / |
|
| | |
|
|\ \
| | |
| | |
| | | |
Add tests for evalShardedByInnerDim contraction + fix bugs
|
| |/ |
|
| | |
|