diff options
author | Justin Lebar <jlebar@google.com> | 2017-03-02 17:49:45 -0800 |
---|---|---|
committer | TensorFlower Gardener <gardener@tensorflow.org> | 2017-03-02 18:08:01 -0800 |
commit | 01194694948eb883e99af597d9dbbf3fc9f5c9e2 (patch) | |
tree | ab3517cf656259681283a90c6682c5b320ac36e3 /tensorflow/compiler/xla/service/gpu/gemm_thunk.h | |
parent | e065b3093f4fec5a5f79ad9de81f6baab361962e (diff) |
[XLA] [StreamExecutor] Tune GEMMs when possible.
cublas 8 adds the cublasGemmEx function, which lets you specify an
explicit "algorithm" for the computation. This functions as an opaque
tuning hint to cublas.
This patch adds support for cublasGemmEx to StreamExecutor, and wires up
XLA's GemmThunk to use the new function.
This patch does not add GEMM autotuning support in TensorFlow proper,
only XLA.
Change: 149068961
Diffstat (limited to 'tensorflow/compiler/xla/service/gpu/gemm_thunk.h')
-rw-r--r-- | tensorflow/compiler/xla/service/gpu/gemm_thunk.h | 8 |
1 files changed, 8 insertions, 0 deletions
diff --git a/tensorflow/compiler/xla/service/gpu/gemm_thunk.h b/tensorflow/compiler/xla/service/gpu/gemm_thunk.h index b540da65b4..983cb87292 100644 --- a/tensorflow/compiler/xla/service/gpu/gemm_thunk.h +++ b/tensorflow/compiler/xla/service/gpu/gemm_thunk.h @@ -63,6 +63,14 @@ class GemmThunk : public Thunk { const bool transpose_lhs_; const bool transpose_rhs_; + + // Maps device names (StreamExecutor::DeviceDescription::name()) to autotune + // results. The map's value is the best algorithm we've found for this thunk + // on this device, or an error if none of the algorithms worked and we should + // use the regular gemm without an algorithm. + std::unordered_map<string, + StatusOr<::perftools::gputools::blas::AlgorithmType>> + autotune_results_; }; } // namespace gpu |