aboutsummaryrefslogtreecommitdiffhomepage
path: root/Eigen/src/Core/arch/SSE/Complex.h
diff options
context:
space:
mode:
authorGravatar Gustavo Lima Chaves <gustavo.lima.chaves@intel.com>2018-12-21 11:03:18 -0800
committerGravatar Gustavo Lima Chaves <gustavo.lima.chaves@intel.com>2018-12-21 11:03:18 -0800
commit1024a70e82c0301d9f699fd344613e9cd417ab95 (patch)
tree8bade6795acd372e4a5e4d84e062640b52fb9a9a /Eigen/src/Core/arch/SSE/Complex.h
parente763fcd09e620300226ca22d152b94867123b603 (diff)
gebp: Add new ½ and ¼ packet rows per (peeling) round on the lhs
MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The patch works by altering the gebp lhs packing routines to also consider ½ and ¼ packet lenght rows when packing, besides the original whole package and row-by-row attempts. Finally, gebp itself will try to fit a fraction of a packet at a time if: i) ½ and/or ¼ packets are available for the current context (e.g. AVX2 and SSE-sized SIMD register for x86) ii) The matrix's height is favorable to it (it may be it's too small in that dimension to take full advantage of the current/maximum packet width or it may be the case that last rows may take advantage of smaller packets before gebp goes row-by-row) This helps mitigate huge slowdowns one had on AVX512 builds when compared to AVX2 ones, for some dimensions. Gains top at an extra 1x in throughput. This patch is a complement to changeset 4ad359237aeb519dbd4b55eba43057b37988838c . Since packing is changed, Eigen users which would go for very low-level API usage, like TensorFlow, will have to be adapted to work fine with the changes.
Diffstat (limited to 'Eigen/src/Core/arch/SSE/Complex.h')
0 files changed, 0 insertions, 0 deletions