aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/Core
diff options
context:
space:
mode:
authorGravatar Antonio Sanchez <cantonios@google.com>2021-02-03 08:18:28 -0800
committerGravatar Antonio Sanchez <cantonios@google.com>2021-02-03 09:01:48 -0800
commitf85038b7f3e9a0bd7d2bfbed96cc966863aeea57 (patch)
treea890999030a9b7b22f0091ba5185b1a58d06d550 /Eigen/Core
parent56c8b14d875ae42a52d0da52916fac1e29305ca7 (diff)
Fix excessive GEBP register spilling for 32-bit NEON.
Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM, leading to excessive 16-byte register spills, slowing down basic f32 matrix multiplication by approx 50%. By specializing `gebp_traits`, we can eliminate the register spills. Volatile inline ASM both acts as a barrier to prevent reordering and enforces strict register use. In a simple f32 matrix multiply example, this modification reduces 16-byte spills from 109 instances to zero, leading to a 1.5x speed increase (search for `16-byte Spill` in the assembly in https://godbolt.org/z/chsPbE). This is a replacement of !379. See there for further discussion. Also moved `gebp_traits` specializations for NEON to `Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside other NEON-specific code. Fixes #2138.
Diffstat (limited to 'Eigen/Core')
-rw-r--r--Eigen/Core4
1 files changed, 3 insertions, 1 deletions
diff --git a/Eigen/Core b/Eigen/Core
index 4d9a3309c..1a60dcba4 100644
--- a/Eigen/Core
+++ b/Eigen/Core
@@ -340,7 +340,9 @@ using std::ptrdiff_t;
#include "src/Core/ConditionEstimator.h"
#if defined(EIGEN_VECTORIZE_ALTIVEC) || defined(EIGEN_VECTORIZE_VSX)
-#include "src/Core/arch/AltiVec/MatrixProduct.h"
+ #include "src/Core/arch/AltiVec/MatrixProduct.h"
+#elif defined EIGEN_VECTORIZE_NEON
+ #include "src/Core/arch/NEON/GeneralBlockPanelKernel.h"
#endif
#include "src/Core/BooleanRedux.h"