| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
| |
corresponding bugs fixed. The bugs that
were work-arounded were fixed and verified.
PiperOrigin-RevId: 215497418
|
|\
| |
| |
| | |
PiperOrigin-RevId: 215331087
|
| |
| |
| |
| | |
PiperOrigin-RevId: 214075796
|
| |
| |
| |
| |
| |
| | |
This broke in a recent refactoring.
PiperOrigin-RevId: 213497416
|
| |
| |
| |
| |
| |
| |
| |
| | |
* bias_nd is set to have CUDNN_DATA_FLOAT, even though BiasType is not float.
* double is supported but not exposed through the public interface.
* DoFusedConvolveImpl has duplicated information in its template parameter list.
PiperOrigin-RevId: 213308435
|
| |
| |
| |
| |
| |
| | |
cuDNN 7.1.4 and 7.2 has non-determinisic bug if the buffer is not zeroed.
PiperOrigin-RevId: 211905127
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| | |
-Maintain functionality, just move CalculateOccupancy() and CompareOccupancy() methods from device_description to cuda_gpu_executor
-Remove CUDA requirement in general class device_description
|
| |
| |
| |
| |
| |
| |
| |
| | |
There are several API migrations happening:
* ArraySlice's sub-slice constructor => .subspan
* MutableArraySlice's container pointer constructor => absl::MakeSpan
PiperOrigin-RevId: 210946124
|
|/
|
|
|
|
| |
-Replace references to the UnqueryableDeviceParams struct with calls to CUDA's built-in occupancy calculation functions
-Update calls to the occupancy checking functions with the new changes
-Changes should provide more long-term reliability and will remove the need to manually update hardcoded data values for new GPU architectures
|
|
|
|
|
|
|
|
| |
That is, instances of sp.ToString() are replaced with string(sp).
This will allow tensorflow::StringPiece::ToString to be removed, which is necessary before it can be replaced with absl::string_view.
PiperOrigin-RevId: 209806694
|
|\
| |
| |
| |
| |
| | |
ROCmSoftwarePlatform:upstream-staging-stream-executor-algorithmconfig-profileresult
PiperOrigin-RevId: 207801599
|
| |
| |
| |
| |
| |
| |
| | |
It's unfortunate that this was only added in 9.1, but I haven't found a good
way of emulating the behavior on 9.0 without falling back to non-batched gemms.
PiperOrigin-RevId: 207286575
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is mostly a huge amount of plumbing just to call into the cublas functions.
blasGemmStridedBatched has been available since CUDA 8.0.
For autotuning we'd need cublasGemmStridedBatchedEx, which is new in CUDA 9.2
so I didn't wire that up yet.
PiperOrigin-RevId: 207285707
|
| |
| |
| |
| |
| |
| |
| |
| | |
Add one field, scratch_size_, into AlgorithmDesc. The field would be set by
DNN libraries during algorithm finding / profiling stage. For algorithms not
using scratch memory the field would be zero.
Change CUDA StreamExecutor implementation to set this field properly.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It's possible for an already-existing context to be returned by
cuDevicePrimaryCtxRetain. Previously, this would be handled incorrectly
by CreatedContexts::Add, which was assuming that inserts into the map
always succeeded.
This makes XLA work with
TF_CUDA_PLATFORM_GPU_DEVICE_SCHEDULE=blocking_sync, although exactly how
that flag is related to this bug is unclear to me. It seems like some
sort of race condition, maybe?
PiperOrigin-RevId: 207010059
|
| |
| |
| |
| |
| |
| | |
Comment-only change
PiperOrigin-RevId: 206957994
|
| |
| |
| |
| |
| |
| | |
This makes it easier to see why this function fails.
PiperOrigin-RevId: 206856975
|
| |
| |
| |
| |
| |
| | |
I verified that CUDA 9.1 did not introduce any new algorithms.
PiperOrigin-RevId: 206850523
|
|/
|
|
|
|
|
|
| |
CUBLAS_GEMM_ALGO{3,4}_TENSOR_OP.
These appear to have been omitted by mistake.
PiperOrigin-RevId: 206843312
|
|
|
|
|
|
|
|
|
|
| |
When running with multiple devices, using the wrong context will lead to
a check-fail when trying to set a stream that has been created with a different
context.
This resolves a check-fail on resnet50 with 8 GPUs.
PiperOrigin-RevId: 206274741
|
|
|
|
|
|
|
|
|
|
| |
This code hs been here since 2014, now the oldest supported version of CUDA is
8 so cuGetErrorName should always be available. Also the list of errors is
(of course) out of sync with upstream CUDA.
Also surface the description of the error to the user, if available.
PiperOrigin-RevId: 206191424
|
|
|
|
|
|
| |
This will be used in a future CL.
PiperOrigin-RevId: 205742731
|
|
|
|
| |
PiperOrigin-RevId: 205164273
|
|\
| |
| |
| |
| |
| | |
ROCmSoftwarePlatform:upstream-staging-stream-executor
PiperOrigin-RevId: 205140328
|
| |
| |
| |
| |
| |
| | |
bias activation.
PiperOrigin-RevId: 205008958
|
| |
| |
| |
| |
| |
| | |
returned errors, but crashes instead.
PiperOrigin-RevId: 205000883
|
|\ \
| | |
| | |
| | |
| | |
| | | |
ROCmSoftwarePlatform:upstream-staging-stream-executor-pooling-interface
PiperOrigin-RevId: 204805678
|
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
1) StreamInterface::CudaStreamMemberHack()
Despite the fact that StreamExecutor and GPU common runtime are largely
orthogonal, this particular routine in StreamExecutor is used in GPU common
runtime and a couple of other operators. In this commit it's renamed as
StreamInterface::GpuStreamMemberHack() and their call sites are also changed.
2) StreamExecutorInterface::CudaContextHack()
This member is renamed to StramExecutorInterface::GpuContextHack().
Changes introduced in this commit includes:
- some StreamExecutor interfaces and CUDA implementation
- GPU common runtime related to interface changes in StreamExecutor
- operators affected by interface changes in StreamExecutor
|
|/
|
|
|
|
| |
Due to the design of MIOpen, the DNN library on ROCm platform, an instance of
ScratchAllocator has to be passed into pooling routines. This commit address
such interface changes and the implementation in CUDA StreamExecutor.
|
|
|
|
|
|
| |
to be zeroed.
PiperOrigin-RevId: 203001311
|
|
|
|
| |
PiperOrigin-RevId: 201239428
|
|
|
|
|
|
| |
again in the future. Users are interested to make it work and we don't want to be in the way.
PiperOrigin-RevId: 201214857
|
|
|
|
| |
PiperOrigin-RevId: 200432478
|
|
|
|
| |
PiperOrigin-RevId: 200411493
|
|
|
|
|
|
| |
longer supported.
PiperOrigin-RevId: 200200356
|
|
|
|
|
|
| |
No functional changes.
PiperOrigin-RevId: 200199956
|
|
|
|
| |
PiperOrigin-RevId: 199780350
|
|
|
|
| |
PiperOrigin-RevId: 199321021
|
|
|
|
| |
PiperOrigin-RevId: 198836479
|
|
|
|
| |
PiperOrigin-RevId: 198513480
|
|
|
|
|
|
|
| |
Revert #18413. Too many internal test failures due to the name scope change caused by this change.
Revert #18192. Cannot use re2::StringPiece internally. Need alternative for set call. Will pull and clean this up in a separate change.
PiperOrigin-RevId: 197991247
|
|
|
|
| |
PiperOrigin-RevId: 197650067
|
|
|
|
| |
PiperOrigin-RevId: 197490523
|
|
|
|
|
|
| |
because that breaks XLA tests.
PiperOrigin-RevId: 197328103
|
|
|
|
| |
PiperOrigin-RevId: 197167501
|
|
|
|
| |
PiperOrigin-RevId: 197137612
|