aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/stream_executor/blas.h
Commit message (Collapse)AuthorAge
* Alias tensorflow::gtl::InlinedVector to absl::InlinedVectorGravatar Benjamin Kramer2018-09-05
| | | | PiperOrigin-RevId: 211639440
* [XLA:GPU] Use strided batched gemm instead of building pointer tables.Gravatar Benjamin Kramer2018-08-03
| | | | | | | | | | This is mostly a huge amount of plumbing just to call into the cublas functions. blasGemmStridedBatched has been available since CUDA 8.0. For autotuning we'd need cublasGemmStridedBatchedEx, which is new in CUDA 9.2 so I didn't wire that up yet. PiperOrigin-RevId: 207285707
* Add GPU support for float16 batched matmul (#18436)Gravatar Ben Barsdell2018-05-10
| | | | | | | | | | | * Add GPU support for float16 batched matmul - Uses cublasGemmBatchedEx introduced in CUDA 9.1. - Includes support for Tensor Op math. - Falls back to a loop over non-batched gemm calls on older CUDA versions or GPU architectures. * Refactor GPU batched gemm into one internal func
* Add variants of DoBlasGemmWithAlgorithm with alpha being on device.Gravatar A. Unique TensorFlower2018-04-24
| | | | | | | This is in preparation of allowing XLA to fuse (A dot b) * alpha where alpha can be on device instead of just a constant. PiperOrigin-RevId: 194068597
* [StreamExecutor] Rename ::perftools::gputools -> ::stream_executor, part 1.Gravatar Justin Lebar2018-04-17
| | | | | | | | | | | | | | | | | | | | | | | | | | Step 1 of re-namespace'ing StreamExecutor into ::stream_executor. This moves everything inside of stream_executor/..., and leaves a namespace alias into ::perftools::gputools. The next steps will clean up users to use the new namespace. This is mostly a mechanical change, but it also includes a bunch of non-mechanical changes that ideally would be split out into separate patches. Unfortunately they all sort of need to be shoved in here for various reasons: - forward declarations need to be in the same namespace as the actual types, so we need to change all forward declarations of StreamExecutor types in this one patch. - Uses of these forward declarations need to be changed to the new namespace (or otherwise we need to add a namespace alias to the relevant header, but this is pretty ugly). - Various initialization code needs to live in StreamExecutor's "real" namespace, so all this needs to be changed. PiperOrigin-RevId: 193256128
* [XLA] FP16 Dot support for the CPU and GPU backends.Gravatar Bixia Zheng2018-02-28
| | | | | | | | | | | | | | | | | | | Extend the stream interface ThenBlasGemmWithAlgorithm to support F16 matrix multiplication with computation type FP32. Extend the stream executor interface DoBlasGemmWithAlgorithm to support F16 GEMM with computation type FP32. Extend the CPU IR emitter to handle F16 Dot instruction, and add F16 matrix multiplication implementation to the CPU runtime. Extend the GPU backend to handle FP16 GEMM Thunk. Replicate the existing matrix multiplication test cases in matrix_ops_simple_test and dot_operation_test for FP16. RELNOTES: PiperOrigin-RevId: 187369731
* Update Stream::BlockHostUntilDone examples and documentation.Gravatar A. Unique TensorFlower2017-12-13
| | | | | | The new Status return value must be explicitly handled or ignored. PiperOrigin-RevId: 178950527
* Let GetBlasGemmAlgorithms() always return true.Gravatar Yangzihao Wang2017-07-21
| | | | PiperOrigin-RevId: 162748507
* Automated g4 rollback of changelist 162423171Gravatar A. Unique TensorFlower2017-07-18
| | | | PiperOrigin-RevId: 162437318
* Add autotuning code for matmul operator.Gravatar Yangzihao Wang2017-07-18
| | | | | | Currently it is turned off by default. PiperOrigin-RevId: 162423171
* Add support for int8 x int8 -> int32 matrix multiplication via cublasGemmEx ↵Gravatar A. Unique TensorFlower2017-07-06
| | | | | | to stream_executor. PiperOrigin-RevId: 161137741
* [XLA] [StreamExecutor] Tune GEMMs when possible.Gravatar Justin Lebar2017-03-02
| | | | | | | | | | | | | cublas 8 adds the cublasGemmEx function, which lets you specify an explicit "algorithm" for the computation. This functions as an opaque tuning hint to cublas. This patch adds support for cublasGemmEx to StreamExecutor, and wires up XLA's GemmThunk to use the new function. This patch does not add GEMM autotuning support in TensorFlow proper, only XLA. Change: 149068961
* Remove Eigen/Core includes from public SE headersGravatar A. Unique TensorFlower2017-01-30
| | | | | | This gives the Eigen headers the chance to define the macro EIGEN_USE_GPU in the proper places. Change: 146055230
* Update copyright for 3p/tf.Gravatar A. Unique TensorFlower2016-06-02
| | | | Change: 123901292
* Add fp16 matrix multiplication (GEMM) support to StreamExecutor, gated onGravatar A. Unique TensorFlower2016-05-11
| | | | | | | | | | compilation with CUDA 7.5; fp16 convolutions via cuDNN will come soon. This does not update any TensorFlow ops, but it is a dependency of doing that. Note: fp16 axpy and dot do not exist in CUDA 7.5 and have thus not been added. CUDA 8.0 supports both (through the axpyEx and dotEx interfaces). Change: 122069402
* Support ScratchAllocator in BLAS Batched GEMMGravatar A. Unique TensorFlower2016-03-18
| | | | Change: 117590857
* TensorFlow: upstream changes to git.Gravatar Vijay Vasudevan2015-12-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change 109695551 Update FAQ Change 109694725 Add a gradient for resize_bilinear op. Change 109694505 Don't mention variables module in docs variables.Variable should be tf.Variable. Change 109658848 Adding an option to create a new thread-pool for each session. Change 109640570 Take the snapshot of stream-executor. + Expose an interface for scratch space allocation in the interface. Change 109638559 Let image_summary accept uint8 input This allows users to do their own normalization / scaling if the default (very weird) behavior of image_summary is undesired. This required a slight tweak to fake_input.cc to make polymorphically typed fake inputs infer if their type attr is not set but has a default. Unfortunately, adding a second valid type to image_summary *disables* automatic implicit conversion from np.float64 to tf.float32, so this change is slightly backwards incompatible. Change 109636969 Add serialization operations for SparseTensor. Change 109636644 Update generated Op docs. Change 109634899 TensorFlow: add a markdown file for producing release notes for our releases. Seed with 0.5.0 with a boring but accurate description. Change 109634502 Let histogram_summary take any realnumbertype It used to take only floats, not it understands ints. Change 109634434 TensorFlow: update locations where we mention python 3 support, update them to current truth. Change 109632108 Move HSV <> RGB conversions, grayscale conversions, and adjust_* ops back to tensorflow - make GPU-capable version of RGBToHSV and HSVToRGB, allows only float input/output - change docs to reflect new size constraints - change HSV format to be [0,1] for all components - add automatic dtype conversion for all adjust_* and grayscale conversion ops - fix up docs Change 109631077 Improve optimizer exceptions 1. grads_and_vars is now a tuple, so must be wrapped when passed to format. 2. Use '%r' instead of '%s' for dtype formatting Base CL: 109697989
* TensorFlow: Improve performance of AlexnetGravatar Manjunath Kudlur2015-11-20
| | | | | | | | | | | | | | | | | | | | | | Changes: * error message that refers to removed `DefaultSession` method. * -Wnull-conversion warnings * the "_start_time" attr for recvs when the flag "--brain_enable_scheduling_for_recvs" is set. * typo in tutorial data download progress message. * a typo ("however their installing"=>"however installing"). * typo, rename "TensorFlow Mechanics" to "How To" to be consistent with the website. * a typo ("subtact"=>"subtract"). * protobuf examples in comments in tensorflow::Example.proto. * formula formatting in MNIST beginner tutorial * negative fraction-of-queue-full stats * protobuf inclusion path so that Android demo will build under Blaze. * small typo (moderatly > moderately) * Session.run() to check that tensor arguments come from the session's graph. * another six import * seq2seq typo in bazel command Base CL: 108349164
* TensorFlow: Initial commit of TensorFlow library.Gravatar Manjunath Kudlur2015-11-06
TensorFlow is an open source software library for numerical computation using data flow graphs. Base CL: 107276108