[XLA:GPU] Use strided batched gemm instead of building pointer tables. - tensorflow

diff options

author	Benjamin Kramer <kramerb@google.com>	2018-08-03 10:24:13 -0700
committer	TensorFlower Gardener <gardener@tensorflow.org>	2018-08-03 10:28:08 -0700
commit	e36c16c6720dd64ae5d8a1f8555102a1323af9ae (patch)
tree	9076cc7fa753542490bac8b1fcd2bc3a3c2ac62a /third_party/python_runtime
parent	7935c176118f0e50aa657a1c68a85430b70d2245 (diff)

[XLA:GPU] Use strided batched gemm instead of building pointer tables.

This is mostly a huge amount of plumbing just to call into the cublas functions. blasGemmStridedBatched has been available since CUDA 8.0. For autotuning we'd need cublasGemmStridedBatchedEx, which is new in CUDA 9.2 so I didn't wire that up yet. PiperOrigin-RevId: 207285707

Diffstat (limited to 'third_party/python_runtime')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: