aboutsummaryrefslogtreecommitdiffhomepage
path: root/unsupported/test/cxx11_tensor_contraction.cpp
diff options
context:
space:
mode:
authorGravatar Rasmus Munk Larsen <rmlarsen@google.com>2019-08-07 12:57:42 -0700
committerGravatar Rasmus Munk Larsen <rmlarsen@google.com>2019-08-07 12:57:42 -0700
commiteab7e52db217240d0320e2618eafa37f45158b83 (patch)
tree30a915ab749df5fd599e4b4d6de867afd248c6c6 /unsupported/test/cxx11_tensor_contraction.cpp
parent09871261653b4a373b2aed1561c38a7f5d21a21e (diff)
[Eigen] Vectorize evaluation of coefficient-wise functions over tensor blocks if the strides are known to be 1. Provides up to 20-25% speedup of the TF cross entropy op with AVX.
A few benchmark numbers: name old time/op new time/op delta BM_Xent_16_10000_cpu 448µs ± 3% 389µs ± 2% -13.21% (p=0.008 n=5+5) BM_Xent_32_10000_cpu 575µs ± 6% 454µs ± 3% -21.00% (p=0.008 n=5+5) BM_Xent_64_10000_cpu 933µs ± 4% 712µs ± 1% -23.71% (p=0.008 n=5+5)
Diffstat (limited to 'unsupported/test/cxx11_tensor_contraction.cpp')
0 files changed, 0 insertions, 0 deletions