diff options
author | Rasmus Munk Larsen <rmlarsen@google.com> | 2019-01-31 14:24:08 -0800 |
---|---|---|
committer | Rasmus Munk Larsen <rmlarsen@google.com> | 2019-01-31 14:24:08 -0800 |
commit | 4c0fa6ce0f81ce67dd6723528ddf72f66ae92ba2 (patch) | |
tree | 5b413c61d9fa51dff9973b60fe5b6c25aad4c5d3 /unsupported/Eigen/CXX11/src/Tensor/TensorDimensions.h | |
parent | 7ef879f6bfa465a80109216e6d0b18266ef97321 (diff) |
Speed up Eigen matrix*vector and vector*matrix multiplication.
This change speeds up Eigen matrix * vector and vector * matrix multiplication for dynamic matrices when it is known at runtime that one of the factors is a vector.
The benchmarks below test
c.noalias()= n_by_n_matrix * n_by_1_matrix;
c.noalias()= 1_by_n_matrix * n_by_n_matrix;
respectively.
Benchmark measurements:
SSE:
Run on *** (72 X 2992 MHz CPUs); 2019-01-28T17:51:44.452697457-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64 1096 312 +71.5%
BM_MatVec/128 4581 1464 +68.0%
BM_MatVec/256 18534 5710 +69.2%
BM_MatVec/512 118083 24162 +79.5%
BM_MatVec/1k 704106 173346 +75.4%
BM_MatVec/2k 3080828 742728 +75.9%
BM_MatVec/4k 25421512 4530117 +82.2%
BM_VecMat/32 352 130 +63.1%
BM_VecMat/64 1213 425 +65.0%
BM_VecMat/128 4640 1564 +66.3%
BM_VecMat/256 17902 5884 +67.1%
BM_VecMat/512 70466 24000 +65.9%
BM_VecMat/1k 340150 161263 +52.6%
BM_VecMat/2k 1420590 645576 +54.6%
BM_VecMat/4k 8083859 4364327 +46.0%
AVX2:
Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:45:11.508545307-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64 619 120 +80.6%
BM_MatVec/128 9693 752 +92.2%
BM_MatVec/256 38356 2773 +92.8%
BM_MatVec/512 69006 12803 +81.4%
BM_MatVec/1k 443810 160378 +63.9%
BM_MatVec/2k 2633553 646594 +75.4%
BM_MatVec/4k 16211095 4327148 +73.3%
BM_VecMat/64 925 227 +75.5%
BM_VecMat/128 3438 830 +75.9%
BM_VecMat/256 13427 2936 +78.1%
BM_VecMat/512 53944 12473 +76.9%
BM_VecMat/1k 302264 157076 +48.0%
BM_VecMat/2k 1396811 675778 +51.6%
BM_VecMat/4k 8962246 4459010 +50.2%
AVX512:
Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:35:17.239329863-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64 401 111 +72.3%
BM_MatVec/128 1846 513 +72.2%
BM_MatVec/256 36739 1927 +94.8%
BM_MatVec/512 54490 9227 +83.1%
BM_MatVec/1k 487374 161457 +66.9%
BM_MatVec/2k 2016270 643824 +68.1%
BM_MatVec/4k 13204300 4077412 +69.1%
BM_VecMat/32 324 106 +67.3%
BM_VecMat/64 1034 246 +76.2%
BM_VecMat/128 3576 802 +77.6%
BM_VecMat/256 13411 2561 +80.9%
BM_VecMat/512 58686 10037 +82.9%
BM_VecMat/1k 320862 163750 +49.0%
BM_VecMat/2k 1406719 651397 +53.7%
BM_VecMat/4k 7785179 4124677 +47.0%
Currently watchingStop watching
Diffstat (limited to 'unsupported/Eigen/CXX11/src/Tensor/TensorDimensions.h')
0 files changed, 0 insertions, 0 deletions