diff options
author | Artem Belevich <tra@google.com> | 2019-12-05 12:48:34 -0800 |
---|---|---|
committer | Artem Belevich <tra@google.com> | 2019-12-05 12:48:34 -0800 |
commit | 25230d1862ecfe3f1bf91c12eefe52dbdc0179b9 (patch) | |
tree | 3db318567c010c65bf9539332d3fce38bff7fe18 /unsupported/Eigen/CXX11/src/Tensor/TensorContractionThreadPool.h | |
parent | 08eeb648ea6c329b9b1fb3063993572c21404974 (diff) |
Improve performance of contraction kernels
* Force-inline implementations. They pass around pointers to shared memory
blocks. Without inlining compiler must operate via generic pointers.
Inlining allows compiler to detect that we're operating on shared memory
which allows generation of substantially faster code.
* Fixed a long-standing typo which resulted in launching 8x more kernels
than we needed (.z dimension of the block is unused by the kernel).
Diffstat (limited to 'unsupported/Eigen/CXX11/src/Tensor/TensorContractionThreadPool.h')
0 files changed, 0 insertions, 0 deletions