diff options
author | Antonio Sanchez <cantonios@google.com> | 2021-02-03 08:18:28 -0800 |
---|---|---|
committer | Antonio Sanchez <cantonios@google.com> | 2021-02-03 09:01:48 -0800 |
commit | f85038b7f3e9a0bd7d2bfbed96cc966863aeea57 (patch) | |
tree | a890999030a9b7b22f0091ba5185b1a58d06d550 /Eigen/Core | |
parent | 56c8b14d875ae42a52d0da52916fac1e29305ca7 (diff) |
Fix excessive GEBP register spilling for 32-bit NEON.
Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM,
leading to excessive 16-byte register spills, slowing down basic f32
matrix multiplication by approx 50%.
By specializing `gebp_traits`, we can eliminate the register spills.
Volatile inline ASM both acts as a barrier to prevent reordering and
enforces strict register use. In a simple f32 matrix multiply example,
this modification reduces 16-byte spills from 109 instances to zero,
leading to a 1.5x speed increase (search for `16-byte Spill` in the
assembly in https://godbolt.org/z/chsPbE).
This is a replacement of !379. See there for further discussion.
Also moved `gebp_traits` specializations for NEON to
`Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside
other NEON-specific code.
Fixes #2138.
Diffstat (limited to 'Eigen/Core')
-rw-r--r-- | Eigen/Core | 4 |
1 files changed, 3 insertions, 1 deletions
diff --git a/Eigen/Core b/Eigen/Core index 4d9a3309c..1a60dcba4 100644 --- a/Eigen/Core +++ b/Eigen/Core @@ -340,7 +340,9 @@ using std::ptrdiff_t; #include "src/Core/ConditionEstimator.h" #if defined(EIGEN_VECTORIZE_ALTIVEC) || defined(EIGEN_VECTORIZE_VSX) -#include "src/Core/arch/AltiVec/MatrixProduct.h" + #include "src/Core/arch/AltiVec/MatrixProduct.h" +#elif defined EIGEN_VECTORIZE_NEON + #include "src/Core/arch/NEON/GeneralBlockPanelKernel.h" #endif #include "src/Core/BooleanRedux.h" |