aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/stream_executor/stream.cc
Commit message (Collapse)AuthorAge
* Fix and complete StreamExecutor's DoFusedConvolve:Gravatar Tim Shen2018-09-17
| | | | | | | | * bias_nd is set to have CUDNN_DATA_FLOAT, even though BiasType is not float. * double is supported but not exposed through the public interface. * DoFusedConvolveImpl has duplicated information in its template parameter list. PiperOrigin-RevId: 213308435
* [SE] Avoid deadlock by calling HostCallbacks even when the stream is in an ↵Gravatar A. Unique TensorFlower2018-08-22
| | | | | | | | error state HostCallbacks may trigger notifications that, if elided, would cause programs to hang. Ideally we would have errback semantics, but this is a band-aid while the semantics are redefined. PiperOrigin-RevId: 209818770
* [SE] Don't CHECK-fail when the stream is not-OKGravatar A. Unique TensorFlower2018-08-22
| | | | | | This check-fail was wrong anyway; it meant to check the *substream's* status, but checked its own anyway. We could be in an error state and that's absolutely fine, we shouldn't kill the process for this. PiperOrigin-RevId: 209721359
* Implement DoHostCallbackWithStatus to allow callbacks to return a statusGravatar A. Unique TensorFlower2018-08-07
| | | | PiperOrigin-RevId: 207714420
* Drop failed sub-streams during both Get and Return.Gravatar Todd Wang2018-08-03
| | | | | | | | | | | | The old code ensured that failed sub-streams would not be re-used, but had two flaws: 1) It only checked for failed sub-streams during Return. 2) It didn't actually remove the failed sub-streams from our state. The new code fixes these two flaws, and adds an extra test that explains why (1) is insufficient. PiperOrigin-RevId: 207333296
* [XLA:GPU] Use strided batched gemm instead of building pointer tables.Gravatar Benjamin Kramer2018-08-03
| | | | | | | | | | This is mostly a huge amount of plumbing just to call into the cublas functions. blasGemmStridedBatched has been available since CUDA 8.0. For autotuning we'd need cublasGemmStridedBatchedEx, which is new in CUDA 9.2 so I didn't wire that up yet. PiperOrigin-RevId: 207285707
* [SE] Ensure we BlockHostUntilDone before we deallocate temporary memoryGravatar A. Unique TensorFlower2018-07-30
| | | | PiperOrigin-RevId: 206595861
* Ensure failed sub-streams are not re-used.Gravatar Todd Wang2018-07-25
| | | | | | | | | Streams have a monotonic state machine; if a stream encounters any error, it will remain in an error state forever. Without this change, a previously failed sub-stream will be put back on sub_streams_, only to cause the next usage of the sub-stream to trivially fail. PiperOrigin-RevId: 206112024
* Automated rollback of commit 0ea6847c892497afdd20c1150fee1e532612ca17Gravatar A. Unique TensorFlower2018-07-24
| | | | PiperOrigin-RevId: 205885304
* Merge pull request #20706 from ↵Gravatar TensorFlower Gardener2018-07-16
|\ | | | | | | | | | | ROCmSoftwarePlatform:upstream-staging-stream-executor-pooling-interface PiperOrigin-RevId: 204805678
* | Automated rollback of commit f8044c89287b1d90510ceace4b53ec94abaffa50Gravatar A. Unique TensorFlower2018-07-12
| | | | | | | | PiperOrigin-RevId: 204327453
* | Automated rollback of commit 0ea6847c892497afdd20c1150fee1e532612ca17Gravatar A. Unique TensorFlower2018-07-12
| | | | | | | | PiperOrigin-RevId: 204326206
| * [ROCm] Interface changes for pooling APIs in StreamExecutorGravatar Wen-Heng (Jack) Chung2018-07-11
|/ | | | | | Due to the design of MIOpen, the DNN library on ROCm platform, an instance of ScratchAllocator has to be passed into pooling routines. This commit address such interface changes and the implementation in CUDA StreamExecutor.
* [SE,XLA] Switch to using multiple streams in xla_device_contextGravatar A. Unique TensorFlower2018-07-09
| | | | | | | | | | | Instead of having one stream for compute, host-to-device and device-to-host transfers, switch to having separate streams, just like the GPU does. Add a se::Event field to XlaTensor to allow accurate inter-stream dependencies to be created. As part of this: - Fix TransferManager::TransferLiteralFrom/ToDevice to correctly make generated substreams wait on their master stream. - Fix Stream::BlockHostUntilDone() to not block on or return substreams. This behavior is completely broken and not only nondeterministically returns substreams to the pool but causes indefinite hangs with the HostStream. PiperOrigin-RevId: 203726543
* Add return statement to end of ToVlogString(dnn::DataType data_type)Gravatar James Keeling2018-06-13
| | | | | | | | Whilst the switch statement covers all possible enum values, the compiler still complains that it reaches the end of the function without returning a value. I add an "unknown" string, mirroring the one in the function just above. PiperOrigin-RevId: 200452885
* Merge changes from github.Gravatar Yifei Feng2018-05-24
| | | | | | | Revert #18413. Too many internal test failures due to the name scope change caused by this change. Revert #18192. Cannot use re2::StringPiece internally. Need alternative for set call. Will pull and clean this up in a separate change. PiperOrigin-RevId: 197991247
* Use parenthesis based construction instead of brace initializationGravatar Smit Hinsu2018-05-09
| | | | | | | | Updates all the construction calls for Status, ScopedActivateContext and mutexes withing stream_executor to follow the recommendation in https://abseil.io/tips/88 PiperOrigin-RevId: 196007931
* Add variants of DoBlasGemmWithAlgorithm with alpha being on device.Gravatar A. Unique TensorFlower2018-04-24
| | | | | | | This is in preparation of allowing XLA to fuse (A dot b) * alpha where alpha can be on device instead of just a constant. PiperOrigin-RevId: 194068597
* [StreamExecutor] Rename ::perftools::gputools -> ::stream_executor, part 1.Gravatar Justin Lebar2018-04-17
| | | | | | | | | | | | | | | | | | | | | | | | | | Step 1 of re-namespace'ing StreamExecutor into ::stream_executor. This moves everything inside of stream_executor/..., and leaves a namespace alias into ::perftools::gputools. The next steps will clean up users to use the new namespace. This is mostly a mechanical change, but it also includes a bunch of non-mechanical changes that ideally would be split out into separate patches. Unfortunately they all sort of need to be shoved in here for various reasons: - forward declarations need to be in the same namespace as the actual types, so we need to change all forward declarations of StreamExecutor types in this one patch. - Uses of these forward declarations need to be changed to the new namespace (or otherwise we need to add a namespace alias to the relevant header, but this is pretty ugly). - Various initialization code needs to live in StreamExecutor's "real" namespace, so all this needs to be changed. PiperOrigin-RevId: 193256128
* Support RNN profiling in StreamExecutor for CUDA GPUs.Gravatar James Qin2018-04-06
| | | | | | This change hasn't applied autotune on TF Cudnn kernels, only provides lower level support. PiperOrigin-RevId: 191919566
* Make strcat.{h,cc} independent of Eigen.Gravatar A. Unique TensorFlower2018-03-21
| | | | PiperOrigin-RevId: 189954596
* [StreamExecutor] Remove ThenDoHostCallbackForTest -- it's identical to ↵Gravatar Justin Lebar2018-03-09
| | | | | | | | | | | ThenDoHostCallback. The reason this came about is: ThenDoHostCallback was once private, and ThenDoHostCallbackForTest was public. Then at some point ThenDoHostCallback became public, but the *ForTest one was never removed. PiperOrigin-RevId: 188459741
* StreamExecutor support for float64 convolutions and backprop.Gravatar Brian Patton2018-03-06
| | | | PiperOrigin-RevId: 188025477
* [StreamExecutor] Change "variance" to "inv_var" in BatchNormalizationBackward.Gravatar Justin Lebar2017-12-18
| | | | | | | | | | | | | | This parameter is not the variance of the data, but rather is 1/(sqrt(variance + epsilon). Neglecting epsilon, this is the inverse standard deviation. "inv_stddev" might be a better name, but "inv_var" is certainly better than plain "variance", and it matches nvidia's name for this parameter, which I think may override the desire for a more precise name. No functional change. PiperOrigin-RevId: 179352839
* Remove Stream::BlockHostUntilDoneWithStatus; all callers use BlockHostUntilDone.Gravatar A. Unique TensorFlower2017-12-15
| | | | PiperOrigin-RevId: 179213341
* Stream::BlockHostUntilDone now returns Status rather than bool.Gravatar A. Unique TensorFlower2017-12-13
| | | | | | | | | | | | | | | The now-deprecated Stream::BlockHostUntilDoneWithStatus remains, to facilitate a multi-CL renaming transition. Once all callers have been renamed to BlockHostUntilDone, *WithStatus will be removed. The StreamExecutor (private) method has also been renamed to BlockHostUntilDone. It's only used by Stream. The StreamExecutorInterface method will be renamed in a separate atomic CL. It's harder to perform that transition gradually, and we've already performed an atomic change previously, so we might as well fix it up in one shot. PiperOrigin-RevId: 178907807
* Add BlockHostUntilDoneWithStatus, which returns Status rather than bool.Gravatar A. Unique TensorFlower2017-12-06
| | | | | | | | | Also fixed a deadlock in Stream::BlockHostUntilDone. The problem with the original code was that it grabbed mu_ before looping over substreams, and would call CheckError with mu_ still held. But CheckError will attempt to lock mu_ in the failure case, which would deadlock. PiperOrigin-RevId: 178191634
* Support Cudnn RNN Fp16Gravatar James Qin2017-11-03
| | | | | | Relax CudnnRNNTestCompatibleRNNCells test error tolerance a bit. PiperOrigin-RevId: 174495089
* Add float16 support to tf.nn.fused_batch_norm on the GPU.Gravatar Reed Wanderman-Milne2017-09-27
| | | | | | Scale, offset, mean, and variance must still be float32 if the input is float16. PiperOrigin-RevId: 170239448
* Add int8 version of fused_conv2d_bias_activation operator for the forward phase,Gravatar A. Unique TensorFlower2017-09-06
| | | | | | and support side_input and scaling parameters in float and int8 versions. PiperOrigin-RevId: 167763219
* Automated g4 rollback of changelist 166276461Gravatar A. Unique TensorFlower2017-08-24
| | | | PiperOrigin-RevId: 166305887
* Add int8 version of fused_conv2d_bias_activation operator for the forward phase,Gravatar A. Unique TensorFlower2017-08-23
| | | | | | and support side_input and scaling parameters in float and int8 versions. PiperOrigin-RevId: 166276461
* Let GetBlasGemmAlgorithms() always return true.Gravatar Yangzihao Wang2017-07-21
| | | | PiperOrigin-RevId: 162748507
* Automated g4 rollback of changelist 162423171Gravatar A. Unique TensorFlower2017-07-18
| | | | PiperOrigin-RevId: 162437318
* Add autotuning code for matmul operator.Gravatar Yangzihao Wang2017-07-18
| | | | | | Currently it is turned off by default. PiperOrigin-RevId: 162423171
* Support float64 CuDNN RNNGravatar James Qin2017-07-18
| | | | PiperOrigin-RevId: 162412879
* Add support for int8 x int8 -> int32 matrix multiplication via cublasGemmEx ↵Gravatar A. Unique TensorFlower2017-07-06
| | | | | | to stream_executor. PiperOrigin-RevId: 161137741
* [SE] ThenConvolveWithAlgorithm vlogs algorithm configs.Gravatar Jingyue Wu2017-06-27
| | | | PiperOrigin-RevId: 160292762
* [SE] Support alpha scale in cudnnTransformTensorGravatar A. Unique TensorFlower2017-06-20
| | | | PiperOrigin-RevId: 159578357
* TransformTensor supports NCHW_VECT_C layout and int8 data type.Gravatar Jingyue Wu2017-06-12
| | | | | | | | | | | o Add new DataType kInt8 o Add new DataLayout kBatchDepthYX4 and new FilterLayout kOutputInputYX4, both of which map to CUDNN_TENSOR_NCHW_VECT_C o Change (Then|Do)TransformTensor to take input and output element types. o Add new tensor format FORMAT_NCHW_VECT_C, and change the utility functions in tensor_format.h to work with the new format. PiperOrigin-RevId: 158806412
* Pass int parameter by value, not by const referenceGravatar A. Unique TensorFlower2017-06-06
| | | | PiperOrigin-RevId: 158142102
* [SE] Add cudnnTransformTensor to StreamExecutor.Gravatar Jingyue Wu2017-06-05
| | | | PiperOrigin-RevId: 158062553
* Add functional support for cudnnConvolutionBiasActivationForward().Gravatar Yangzihao Wang2017-06-01
| | | | PiperOrigin-RevId: 157788425
* Merge changes from github.Gravatar A. Unique TensorFlower2017-04-04
| | | | Change: 152200430
* Merge changes from github.Gravatar Dandelion Mané2017-03-10
| | | | Change: 149800363
* [XLA] [StreamExecutor] Tune GEMMs when possible.Gravatar Justin Lebar2017-03-02
| | | | | | | | | | | | | cublas 8 adds the cublasGemmEx function, which lets you specify an explicit "algorithm" for the computation. This functions as an opaque tuning hint to cublas. This patch adds support for cublasGemmEx to StreamExecutor, and wires up XLA's GemmThunk to use the new function. This patch does not add GEMM autotuning support in TensorFlow proper, only XLA. Change: 149068961
* [XLA:GPU] Cache GPU substreams across executionsGravatar A. Unique TensorFlower2017-03-02
| | | | Change: 149063035
* Add options argument for DNN activationGravatar A. Unique TensorFlower2017-01-24
| | | | | This is useful for platform-dependent functionality. Change: 145432435
* Add convolve quantized ops to StreamExecutor APIGravatar A. Unique TensorFlower2017-01-19
| | | | Change: 144996696
* Add several operations to the StreamExecutor APIGravatar A. Unique TensorFlower2017-01-17
| | | | | No implementations are yet provided for these operations. Change: 144743665