aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/stream_executor
Commit message (Collapse)AuthorAge
* Disable the cuDNN workarounds if the version number is new enough to get the ↵Gravatar Tim Shen2018-10-02
| | | | | | | | corresponding bugs fixed. The bugs that were work-arounded were fixed and verified. PiperOrigin-RevId: 215497418
* Merge pull request #21958 from MattConley:CudaOccupancyGravatar TensorFlower Gardener2018-10-01
|\ | | | | | | PiperOrigin-RevId: 215331087
* | Add cuDNN fused convolution forward support.Gravatar Tim Shen2018-09-24
| | | | | | | | | | | | The tests are in the next patch. PiperOrigin-RevId: 214362688
* | Move winograd algorithm workaround to stream executor.Gravatar Tim Shen2018-09-21
| | | | | | | | PiperOrigin-RevId: 214075796
* | [SE] Use absl instead of TF classes where an absl version existsGravatar Benjamin Kramer2018-09-20
| | | | | | | | | | | | | | With the exception of StrCat all of these are using absl already, this change just removes one layer of indirection. PiperOrigin-RevId: 213846036
* | Added ABSL_DEPRECATED annotations to various deprecated TensorFlow functions.Gravatar A. Unique TensorFlower2018-09-19
| | | | | | | | PiperOrigin-RevId: 213693027
* | [SE] Restore int8x4 data types if that's the requested DataLayout for fused convGravatar Benjamin Kramer2018-09-18
| | | | | | | | | | | | This broke in a recent refactoring. PiperOrigin-RevId: 213497416
* | Fix and complete StreamExecutor's DoFusedConvolve:Gravatar Tim Shen2018-09-17
| | | | | | | | | | | | | | | | * bias_nd is set to have CUDNN_DATA_FLOAT, even though BiasType is not float. * double is supported but not exposed through the public interface. * DoFusedConvolveImpl has duplicated information in its template parameter list. PiperOrigin-RevId: 213308435
* | Internal change.Gravatar Anna R2018-09-12
| | | | | | | | PiperOrigin-RevId: 212684548
* | Zero out the result buffer for strided conv backward filter for NHWC layouts.Gravatar Tim Shen2018-09-06
| | | | | | | | | | | | cuDNN 7.1.4 and 7.2 has non-determinisic bug if the buffer is not zeroed. PiperOrigin-RevId: 211905127
| * Fully fixed clang errorsGravatar Matt Conley2018-09-06
| |
| * Fixed clang formattingGravatar Matt Conley2018-09-06
| |
* | Alias tensorflow::gtl::InlinedVector to absl::InlinedVectorGravatar Benjamin Kramer2018-09-05
| | | | | | | | PiperOrigin-RevId: 211639440
| * Recommended typo fixGravatar Matt Conley2018-09-04
| |
| * Fixed transition typoGravatar Matt Conley2018-09-04
| |
| * Move CUDA-specific occupancy calculation into proper fileGravatar Matt Conley2018-09-04
| | | | | | | | | | -Maintain functionality, just move CalculateOccupancy() and CompareOccupancy() methods from device_description to cuda_gpu_executor -Remove CUDA requirement in general class device_description
* | Remove (Mutable)ArraySlice implementation and alias them to absl::Span.Gravatar Tim Shen2018-08-30
| | | | | | | | | | | | | | | | There are several API migrations happening: * ArraySlice's sub-slice constructor => .subspan * MutableArraySlice's container pointer constructor => absl::MakeSpan PiperOrigin-RevId: 210946124
| * Update GPU occupancy checking to utilize CUDA's occupancy calculator functionsGravatar Matt Conley2018-08-28
|/ | | | | | -Replace references to the UnqueryableDeviceParams struct with calls to CUDA's built-in occupancy calculation functions -Update calls to the occupancy checking functions with the new changes -Changes should provide more long-term reliability and will remove the need to manually update hardcoded data values for new GPU architectures
* Removed redundant std::string -> string conversions.Gravatar A. Unique TensorFlower2018-08-28
| | | | PiperOrigin-RevId: 210596417
* Removed ToString method from tensorflow::StringPiece.Gravatar A. Unique TensorFlower2018-08-28
| | | | | | This will make it easier to replace tensorflow::StringPiece with absl::string_view, as absl::string_view does not contain a ToString method. PiperOrigin-RevId: 210550029
* Removed redundant std::string -> string conversions.Gravatar A. Unique TensorFlower2018-08-24
| | | | PiperOrigin-RevId: 210127626
* [SE] Avoid deadlock by calling HostCallbacks even when the stream is in an ↵Gravatar A. Unique TensorFlower2018-08-22
| | | | | | | | error state HostCallbacks may trigger notifications that, if elided, would cause programs to hang. Ideally we would have errback semantics, but this is a band-aid while the semantics are redefined. PiperOrigin-RevId: 209818770
* Replaced calls to tensorflow::StringPiece::ToString with string conversions.Gravatar A. Unique TensorFlower2018-08-22
| | | | | | | | That is, instances of sp.ToString() are replaced with string(sp). This will allow tensorflow::StringPiece::ToString to be removed, which is necessary before it can be replaced with absl::string_view. PiperOrigin-RevId: 209806694
* [SE] Don't CHECK-fail when the stream is not-OKGravatar A. Unique TensorFlower2018-08-22
| | | | | | This check-fail was wrong anyway; it meant to check the *substream's* status, but checked its own anyway. We could be in an error state and that's absolutely fine, we shouldn't kill the process for this. PiperOrigin-RevId: 209721359
* fix C++ header guards.Gravatar A. Unique TensorFlower2018-08-21
| | | | PiperOrigin-RevId: 209679086
* Merge pull request #20536 from rongjiecomputer:flagGravatar TensorFlower Gardener2018-08-13
|\ | | | | | | PiperOrigin-RevId: 208565050
* | Destroy the task before unblocking its waiters.Gravatar Tim Shen2018-08-13
| | | | | | | | PiperOrigin-RevId: 208508212
* | Automated rollback of commit 56e4ea405d13125a3dcb6459019a83d12330bf84Gravatar Peter Hawkins2018-08-13
| | | | | | | | PiperOrigin-RevId: 208505669
* | Automated rollback of commit b306f5f9458feddbdb89b7db557cb74dc9408d07Gravatar Peter Hawkins2018-08-10
| | | | | | | | PiperOrigin-RevId: 208200028
* | [TF:XLA] Add a real implementation of XlaDevice::Sync() so Session::Run() ↵Gravatar Peter Hawkins2018-08-09
| | | | | | | | | | | | | | | | will correctly wait for all computations to complete on an XLA device before termination. [TF:XLA] Change the XlaTensor definition event to be a shared pointer to an stream_executor::Event. This allows many tensors to share the same definition event. PiperOrigin-RevId: 208128264
* | Merge pull request #21232 from ghostplant:fix-typoGravatar TensorFlower Gardener2018-08-08
|\ \ | | | | | | | | | PiperOrigin-RevId: 207983992
* \ \ Merge pull request #20708 from ↵Gravatar TensorFlower Gardener2018-08-07
|\ \ \ | | | | | | | | | | | | | | | | | | | | ROCmSoftwarePlatform:upstream-staging-stream-executor-algorithmconfig-profileresult PiperOrigin-RevId: 207801599
* | | | Implement DoHostCallbackWithStatus to allow callbacks to return a statusGravatar A. Unique TensorFlower2018-08-07
| | | | | | | | | | | | | | | | PiperOrigin-RevId: 207714420
* | | | Drop failed sub-streams during both Get and Return.Gravatar Todd Wang2018-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old code ensured that failed sub-streams would not be re-used, but had two flaws: 1) It only checked for failed sub-streams during Return. 2) It didn't actually remove the failed sub-streams from our state. The new code fixes these two flaws, and adds an extra test that explains why (1) is insufficient. PiperOrigin-RevId: 207333296
* | | | [XLA:GPU] Add a fast version of gemmStridedBatched for cuda 9.1Gravatar Benjamin Kramer2018-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It's unfortunate that this was only added in 9.1, but I haven't found a good way of emulating the behavior on 9.0 without falling back to non-batched gemms. PiperOrigin-RevId: 207286575
* | | | [XLA:GPU] Use strided batched gemm instead of building pointer tables.Gravatar Benjamin Kramer2018-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is mostly a huge amount of plumbing just to call into the cublas functions. blasGemmStridedBatched has been available since CUDA 8.0. For autotuning we'd need cublasGemmStridedBatchedEx, which is new in CUDA 9.2 so I didn't wire that up yet. PiperOrigin-RevId: 207285707
| * | | Add scratch memory size in AlgorithmDescGravatar Wen-Heng (Jack) Chung2018-08-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add one field, scratch_size_, into AlgorithmDesc. The field would be set by DNN libraries during algorithm finding / profiling stage. For algorithms not using scratch memory the field would be zero. Change CUDA StreamExecutor implementation to set this field properly.
* | | | [SE] Allow context reuse in CreatedContexts::Add.Gravatar Justin Lebar2018-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's possible for an already-existing context to be returned by cuDevicePrimaryCtxRetain. Previously, this would be handled incorrectly by CreatedContexts::Add, which was assuming that inserts into the map always succeeded. This makes XLA work with TF_CUDA_PLATFORM_GPU_DEVICE_SCHEDULE=blocking_sync, although exactly how that flag is related to this bug is unclear to me. It seems like some sort of race condition, maybe? PiperOrigin-RevId: 207010059
* | | | [SE] Add an nvbugs link.Gravatar Justin Lebar2018-08-01
| | | | | | | | | | | | | | | | | | | | | | | | Comment-only change PiperOrigin-RevId: 206957994
* | | | [SE] Add additional log statements to DoBlasGemmWithAlgorithmImpl.Gravatar Justin Lebar2018-07-31
| | | | | | | | | | | | | | | | | | | | | | | | This makes it easier to see why this function fails. PiperOrigin-RevId: 206856975
* | | | [SE] Add new cublas algorithms from CUDA 9.2.Gravatar Justin Lebar2018-07-31
| | | | | | | | | | | | | | | | | | | | | | | | I verified that CUDA 9.1 did not introduce any new algorithms. PiperOrigin-RevId: 206850523
* | | | [SE] Add missing cublas algorithms for cuda 9.0, ↵Gravatar Justin Lebar2018-07-31
|/ / / | | | | | | | | | | | | | | | | | | | | | CUBLAS_GEMM_ALGO{3,4}_TENSOR_OP. These appear to have been omitted by mistake. PiperOrigin-RevId: 206843312
* | | [SE] Ensure we BlockHostUntilDone before we deallocate temporary memoryGravatar A. Unique TensorFlower2018-07-30
| | | | | | | | | | | | PiperOrigin-RevId: 206595861
| * | Fix typo: host_src -> gpu_src for inter-gpu copyGravatar CUI Wei2018-07-29
|/ / | | | | | | Signed-off-by: CUI Wei <ghostplant@qq.com>
* | [XLA:GPU] Only add the cubin if it is availableGravatar Benjamin Kramer2018-07-27
| | | | | | | | | | | | | | | | It's only non-empty if we were able to run ptxas. If the PTX is going to be JIT'ed by the driver it won't be around. Loading an empty cubin will result in a fatal error. PiperOrigin-RevId: 206341931
* | Set the correct context when calling cudnnCreate.Gravatar A. Unique TensorFlower2018-07-26
| | | | | | | | | | | | | | | | | | | | When running with multiple devices, using the wrong context will lead to a check-fail when trying to set a stream that has been created with a different context. This resolves a check-fail on resnet50 with 8 GPUs. PiperOrigin-RevId: 206274741
* | [SE] Try again to query the GPU driver for error descriptionsGravatar Benjamin Kramer2018-07-26
| | | | | | | | | | | | | | | | | | | | This code hs been here since 2014, now the oldest supported version of CUDA is 8 so cuGetErrorName should always be available. Also the list of errors is (of course) out of sync with upstream CUDA. Also surface the description of the error to the user, if available. PiperOrigin-RevId: 206191424
* | Ensure failed sub-streams are not re-used.Gravatar Todd Wang2018-07-25
| | | | | | | | | | | | | | | | | | Streams have a monotonic state machine; if a stream encounters any error, it will remain in an error state forever. Without this change, a previously failed sub-stream will be put back on sub_streams_, only to cause the next usage of the sub-stream to trivially fail. PiperOrigin-RevId: 206112024
* | Automated rollback of commit 0ea6847c892497afdd20c1150fee1e532612ca17Gravatar A. Unique TensorFlower2018-07-24
| | | | | | | | PiperOrigin-RevId: 205885304
* | Teach StreamExecutor to load modules and resolve symbols in themGravatar Sanjoy Das2018-07-23
| | | | | | | | | | | | This will be used in a future CL. PiperOrigin-RevId: 205742731