aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/stream_executor/cuda
Commit message (Collapse)AuthorAge
* Disable the cuDNN workarounds if the version number is new enough to get the ↵Gravatar Tim Shen2018-10-02
| | | | | | | | corresponding bugs fixed. The bugs that were work-arounded were fixed and verified. PiperOrigin-RevId: 215497418
* Merge pull request #21958 from MattConley:CudaOccupancyGravatar TensorFlower Gardener2018-10-01
|\ | | | | | | PiperOrigin-RevId: 215331087
* | Move winograd algorithm workaround to stream executor.Gravatar Tim Shen2018-09-21
| | | | | | | | PiperOrigin-RevId: 214075796
* | [SE] Restore int8x4 data types if that's the requested DataLayout for fused convGravatar Benjamin Kramer2018-09-18
| | | | | | | | | | | | This broke in a recent refactoring. PiperOrigin-RevId: 213497416
* | Fix and complete StreamExecutor's DoFusedConvolve:Gravatar Tim Shen2018-09-17
| | | | | | | | | | | | | | | | * bias_nd is set to have CUDNN_DATA_FLOAT, even though BiasType is not float. * double is supported but not exposed through the public interface. * DoFusedConvolveImpl has duplicated information in its template parameter list. PiperOrigin-RevId: 213308435
* | Zero out the result buffer for strided conv backward filter for NHWC layouts.Gravatar Tim Shen2018-09-06
| | | | | | | | | | | | cuDNN 7.1.4 and 7.2 has non-determinisic bug if the buffer is not zeroed. PiperOrigin-RevId: 211905127
| * Fully fixed clang errorsGravatar Matt Conley2018-09-06
| |
| * Fixed clang formattingGravatar Matt Conley2018-09-06
| |
| * Recommended typo fixGravatar Matt Conley2018-09-04
| |
| * Fixed transition typoGravatar Matt Conley2018-09-04
| |
| * Move CUDA-specific occupancy calculation into proper fileGravatar Matt Conley2018-09-04
| | | | | | | | | | -Maintain functionality, just move CalculateOccupancy() and CompareOccupancy() methods from device_description to cuda_gpu_executor -Remove CUDA requirement in general class device_description
* | Remove (Mutable)ArraySlice implementation and alias them to absl::Span.Gravatar Tim Shen2018-08-30
| | | | | | | | | | | | | | | | There are several API migrations happening: * ArraySlice's sub-slice constructor => .subspan * MutableArraySlice's container pointer constructor => absl::MakeSpan PiperOrigin-RevId: 210946124
| * Update GPU occupancy checking to utilize CUDA's occupancy calculator functionsGravatar Matt Conley2018-08-28
|/ | | | | | -Replace references to the UnqueryableDeviceParams struct with calls to CUDA's built-in occupancy calculation functions -Update calls to the occupancy checking functions with the new changes -Changes should provide more long-term reliability and will remove the need to manually update hardcoded data values for new GPU architectures
* Replaced calls to tensorflow::StringPiece::ToString with string conversions.Gravatar A. Unique TensorFlower2018-08-22
| | | | | | | | That is, instances of sp.ToString() are replaced with string(sp). This will allow tensorflow::StringPiece::ToString to be removed, which is necessary before it can be replaced with absl::string_view. PiperOrigin-RevId: 209806694
* Merge pull request #20708 from ↵Gravatar TensorFlower Gardener2018-08-07
|\ | | | | | | | | | | ROCmSoftwarePlatform:upstream-staging-stream-executor-algorithmconfig-profileresult PiperOrigin-RevId: 207801599
* | [XLA:GPU] Add a fast version of gemmStridedBatched for cuda 9.1Gravatar Benjamin Kramer2018-08-03
| | | | | | | | | | | | | | It's unfortunate that this was only added in 9.1, but I haven't found a good way of emulating the behavior on 9.0 without falling back to non-batched gemms. PiperOrigin-RevId: 207286575
* | [XLA:GPU] Use strided batched gemm instead of building pointer tables.Gravatar Benjamin Kramer2018-08-03
| | | | | | | | | | | | | | | | | | | | This is mostly a huge amount of plumbing just to call into the cublas functions. blasGemmStridedBatched has been available since CUDA 8.0. For autotuning we'd need cublasGemmStridedBatchedEx, which is new in CUDA 9.2 so I didn't wire that up yet. PiperOrigin-RevId: 207285707
| * Add scratch memory size in AlgorithmDescGravatar Wen-Heng (Jack) Chung2018-08-02
| | | | | | | | | | | | | | | | Add one field, scratch_size_, into AlgorithmDesc. The field would be set by DNN libraries during algorithm finding / profiling stage. For algorithms not using scratch memory the field would be zero. Change CUDA StreamExecutor implementation to set this field properly.
* | [SE] Allow context reuse in CreatedContexts::Add.Gravatar Justin Lebar2018-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | It's possible for an already-existing context to be returned by cuDevicePrimaryCtxRetain. Previously, this would be handled incorrectly by CreatedContexts::Add, which was assuming that inserts into the map always succeeded. This makes XLA work with TF_CUDA_PLATFORM_GPU_DEVICE_SCHEDULE=blocking_sync, although exactly how that flag is related to this bug is unclear to me. It seems like some sort of race condition, maybe? PiperOrigin-RevId: 207010059
* | [SE] Add an nvbugs link.Gravatar Justin Lebar2018-08-01
| | | | | | | | | | | | Comment-only change PiperOrigin-RevId: 206957994
* | [SE] Add additional log statements to DoBlasGemmWithAlgorithmImpl.Gravatar Justin Lebar2018-07-31
| | | | | | | | | | | | This makes it easier to see why this function fails. PiperOrigin-RevId: 206856975
* | [SE] Add new cublas algorithms from CUDA 9.2.Gravatar Justin Lebar2018-07-31
| | | | | | | | | | | | I verified that CUDA 9.1 did not introduce any new algorithms. PiperOrigin-RevId: 206850523
* | [SE] Add missing cublas algorithms for cuda 9.0, ↵Gravatar Justin Lebar2018-07-31
|/ | | | | | | | CUBLAS_GEMM_ALGO{3,4}_TENSOR_OP. These appear to have been omitted by mistake. PiperOrigin-RevId: 206843312
* Set the correct context when calling cudnnCreate.Gravatar A. Unique TensorFlower2018-07-26
| | | | | | | | | | When running with multiple devices, using the wrong context will lead to a check-fail when trying to set a stream that has been created with a different context. This resolves a check-fail on resnet50 with 8 GPUs. PiperOrigin-RevId: 206274741
* [SE] Try again to query the GPU driver for error descriptionsGravatar Benjamin Kramer2018-07-26
| | | | | | | | | | This code hs been here since 2014, now the oldest supported version of CUDA is 8 so cuGetErrorName should always be available. Also the list of errors is (of course) out of sync with upstream CUDA. Also surface the description of the error to the user, if available. PiperOrigin-RevId: 206191424
* Teach StreamExecutor to load modules and resolve symbols in themGravatar Sanjoy Das2018-07-23
| | | | | | This will be used in a future CL. PiperOrigin-RevId: 205742731
* Automated rollback of commit 36a66347e8e344cddee4a8d9123ccbcae40011b1Gravatar A. Unique TensorFlower2018-07-18
| | | | PiperOrigin-RevId: 205164273
* Merge pull request #20675 from ↵Gravatar TensorFlower Gardener2018-07-18
|\ | | | | | | | | | | ROCmSoftwarePlatform:upstream-staging-stream-executor PiperOrigin-RevId: 205140328
* | Support identity activation function in Cudnn implementation of fused conv2d ↵Gravatar A. Unique TensorFlower2018-07-17
| | | | | | | | | | | | bias activation. PiperOrigin-RevId: 205008958
* | Error on some documented invalid Cudnn inputs. Cudnn should haveGravatar A. Unique TensorFlower2018-07-17
| | | | | | | | | | | | returned errors, but crashes instead. PiperOrigin-RevId: 205000883
* | Merge pull request #20706 from ↵Gravatar TensorFlower Gardener2018-07-16
|\ \ | | | | | | | | | | | | | | | ROCmSoftwarePlatform:upstream-staging-stream-executor-pooling-interface PiperOrigin-RevId: 204805678
| | * [ROCm] Interface changes for StreamExecutor to support both CUDA and ROCmGravatar Wen-Heng (Jack) Chung2018-07-12
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1) StreamInterface::CudaStreamMemberHack() Despite the fact that StreamExecutor and GPU common runtime are largely orthogonal, this particular routine in StreamExecutor is used in GPU common runtime and a couple of other operators. In this commit it's renamed as StreamInterface::GpuStreamMemberHack() and their call sites are also changed. 2) StreamExecutorInterface::CudaContextHack() This member is renamed to StramExecutorInterface::GpuContextHack(). Changes introduced in this commit includes: - some StreamExecutor interfaces and CUDA implementation - GPU common runtime related to interface changes in StreamExecutor - operators affected by interface changes in StreamExecutor
| * [ROCm] Interface changes for pooling APIs in StreamExecutorGravatar Wen-Heng (Jack) Chung2018-07-11
|/ | | | | | Due to the design of MIOpen, the DNN library on ROCm platform, an instance of ScratchAllocator has to be passed into pooling routines. This commit address such interface changes and the implementation in CUDA StreamExecutor.
* Workaround the cudnn 7.1.4 correctness bug, where the workspace is required ↵Gravatar A. Unique TensorFlower2018-07-02
| | | | | | to be zeroed. PiperOrigin-RevId: 203001311
* Improve filter for cuBLAS bug.Gravatar A. Unique TensorFlower2018-06-19
| | | | PiperOrigin-RevId: 201239428
* Rollback of changelist 200200356. We might want to support GPUs on MacOS ↵Gravatar A. Unique TensorFlower2018-06-19
| | | | | | again in the future. Users are interested to make it work and we don't want to be in the way. PiperOrigin-RevId: 201214857
* Fix a build failure when cuda version is less than 9000.Gravatar A. Unique TensorFlower2018-06-13
| | | | PiperOrigin-RevId: 200432478
* Detect configurations that would be hitting a bug in cuBLAS and report an error.Gravatar A. Unique TensorFlower2018-06-13
| | | | PiperOrigin-RevId: 200411493
* Remove OS X code from CUDA stream executor because that platform is no ↵Gravatar A. Unique TensorFlower2018-06-12
| | | | | | longer supported. PiperOrigin-RevId: 200200356
* Unify cuDNN descriptor wrapper names.Gravatar A. Unique TensorFlower2018-06-12
| | | | | | No functional changes. PiperOrigin-RevId: 200199956
* Detect configurations that would be hitting bugs in cuDNN and report an error.Gravatar A. Unique TensorFlower2018-06-08
| | | | PiperOrigin-RevId: 199780350
* Do not enable tensor ops for cuDNN RNN unless explicitly specified.Gravatar A. Unique TensorFlower2018-06-05
| | | | PiperOrigin-RevId: 199321021
* Unify error handling in CudnnSupport.Gravatar A. Unique TensorFlower2018-06-01
| | | | PiperOrigin-RevId: 198836479
* Fix GPU build on windowsGravatar Smit Hinsu2018-05-29
| | | | PiperOrigin-RevId: 198513480
* Merge changes from github.Gravatar Yifei Feng2018-05-24
| | | | | | | Revert #18413. Too many internal test failures due to the name scope change caused by this change. Revert #18192. Cannot use re2::StringPiece internally. Need alternative for set call. Will pull and clean this up in a separate change. PiperOrigin-RevId: 197991247
* Add convolution with NHWC layout to stream executor.Gravatar A. Unique TensorFlower2018-05-22
| | | | PiperOrigin-RevId: 197650067
* Introduce an option to allocate CUDA unified memoryGravatar Smit Hinsu2018-05-21
| | | | PiperOrigin-RevId: 197490523
* Rollforward of CL 197167501, without enabling CUDNN_FFT_TILING_FORWARD ↵Gravatar A. Unique TensorFlower2018-05-20
| | | | | | because that breaks XLA tests. PiperOrigin-RevId: 197328103
* Automated g4 rollback of changelist 197118212Gravatar A. Unique TensorFlower2018-05-18
| | | | PiperOrigin-RevId: 197167501
* Dropping support for CUDA < 8.Gravatar A. Unique TensorFlower2018-05-18
| | | | PiperOrigin-RevId: 197137612