| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
| |
corresponding bugs fixed. The bugs that
were work-arounded were fixed and verified.
PiperOrigin-RevId: 215497418
|
|\
| |
| |
| | |
PiperOrigin-RevId: 215331087
|
| |
| |
| |
| |
| |
| | |
The tests are in the next patch.
PiperOrigin-RevId: 214362688
|
| |
| |
| |
| | |
PiperOrigin-RevId: 214075796
|
| |
| |
| |
| |
| |
| |
| | |
With the exception of StrCat all of these are using absl already, this change
just removes one layer of indirection.
PiperOrigin-RevId: 213846036
|
| |
| |
| |
| | |
PiperOrigin-RevId: 213693027
|
| |
| |
| |
| |
| |
| | |
This broke in a recent refactoring.
PiperOrigin-RevId: 213497416
|
| |
| |
| |
| |
| |
| |
| |
| | |
* bias_nd is set to have CUDNN_DATA_FLOAT, even though BiasType is not float.
* double is supported but not exposed through the public interface.
* DoFusedConvolveImpl has duplicated information in its template parameter list.
PiperOrigin-RevId: 213308435
|
| |
| |
| |
| | |
PiperOrigin-RevId: 212684548
|
| |
| |
| |
| |
| |
| | |
cuDNN 7.1.4 and 7.2 has non-determinisic bug if the buffer is not zeroed.
PiperOrigin-RevId: 211905127
|
| | |
|
| | |
|
| |
| |
| |
| | |
PiperOrigin-RevId: 211639440
|
| | |
|
| | |
|
| |
| |
| |
| |
| | |
-Maintain functionality, just move CalculateOccupancy() and CompareOccupancy() methods from device_description to cuda_gpu_executor
-Remove CUDA requirement in general class device_description
|
| |
| |
| |
| |
| |
| |
| |
| | |
There are several API migrations happening:
* ArraySlice's sub-slice constructor => .subspan
* MutableArraySlice's container pointer constructor => absl::MakeSpan
PiperOrigin-RevId: 210946124
|
|/
|
|
|
|
| |
-Replace references to the UnqueryableDeviceParams struct with calls to CUDA's built-in occupancy calculation functions
-Update calls to the occupancy checking functions with the new changes
-Changes should provide more long-term reliability and will remove the need to manually update hardcoded data values for new GPU architectures
|
|
|
|
| |
PiperOrigin-RevId: 210596417
|
|
|
|
|
|
| |
This will make it easier to replace tensorflow::StringPiece with absl::string_view, as absl::string_view does not contain a ToString method.
PiperOrigin-RevId: 210550029
|
|
|
|
| |
PiperOrigin-RevId: 210127626
|
|
|
|
|
|
|
|
| |
error state
HostCallbacks may trigger notifications that, if elided, would cause programs to hang. Ideally we would have errback semantics, but this is a band-aid while the semantics are redefined.
PiperOrigin-RevId: 209818770
|
|
|
|
|
|
|
|
| |
That is, instances of sp.ToString() are replaced with string(sp).
This will allow tensorflow::StringPiece::ToString to be removed, which is necessary before it can be replaced with absl::string_view.
PiperOrigin-RevId: 209806694
|
|
|
|
|
|
| |
This check-fail was wrong anyway; it meant to check the *substream's* status, but checked its own anyway. We could be in an error state and that's absolutely fine, we shouldn't kill the process for this.
PiperOrigin-RevId: 209721359
|
|
|
|
| |
PiperOrigin-RevId: 209679086
|
|\
| |
| |
| | |
PiperOrigin-RevId: 208565050
|
| |
| |
| |
| | |
PiperOrigin-RevId: 208508212
|
| |
| |
| |
| | |
PiperOrigin-RevId: 208505669
|
| |
| |
| |
| | |
PiperOrigin-RevId: 208200028
|
| |
| |
| |
| |
| |
| |
| |
| | |
will correctly wait for all computations to complete on an XLA device before termination.
[TF:XLA] Change the XlaTensor definition event to be a shared pointer to an stream_executor::Event. This allows many tensors to share the same definition event.
PiperOrigin-RevId: 208128264
|
|\ \
| | |
| | |
| | | |
PiperOrigin-RevId: 207983992
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | | |
ROCmSoftwarePlatform:upstream-staging-stream-executor-algorithmconfig-profileresult
PiperOrigin-RevId: 207801599
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 207714420
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The old code ensured that failed sub-streams would not be re-used, but
had two flaws:
1) It only checked for failed sub-streams during Return.
2) It didn't actually remove the failed sub-streams from our state.
The new code fixes these two flaws, and adds an extra test that
explains why (1) is insufficient.
PiperOrigin-RevId: 207333296
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
It's unfortunate that this was only added in 9.1, but I haven't found a good
way of emulating the behavior on 9.0 without falling back to non-batched gemms.
PiperOrigin-RevId: 207286575
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This is mostly a huge amount of plumbing just to call into the cublas functions.
blasGemmStridedBatched has been available since CUDA 8.0.
For autotuning we'd need cublasGemmStridedBatchedEx, which is new in CUDA 9.2
so I didn't wire that up yet.
PiperOrigin-RevId: 207285707
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Add one field, scratch_size_, into AlgorithmDesc. The field would be set by
DNN libraries during algorithm finding / profiling stage. For algorithms not
using scratch memory the field would be zero.
Change CUDA StreamExecutor implementation to set this field properly.
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
It's possible for an already-existing context to be returned by
cuDevicePrimaryCtxRetain. Previously, this would be handled incorrectly
by CreatedContexts::Add, which was assuming that inserts into the map
always succeeded.
This makes XLA work with
TF_CUDA_PLATFORM_GPU_DEVICE_SCHEDULE=blocking_sync, although exactly how
that flag is related to this bug is unclear to me. It seems like some
sort of race condition, maybe?
PiperOrigin-RevId: 207010059
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Comment-only change
PiperOrigin-RevId: 206957994
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This makes it easier to see why this function fails.
PiperOrigin-RevId: 206856975
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
I verified that CUDA 9.1 did not introduce any new algorithms.
PiperOrigin-RevId: 206850523
|
|/ / /
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
CUBLAS_GEMM_ALGO{3,4}_TENSOR_OP.
These appear to have been omitted by mistake.
PiperOrigin-RevId: 206843312
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 206595861
|
|/ /
| |
| |
| | |
Signed-off-by: CUI Wei <ghostplant@qq.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
It's only non-empty if we were able to run ptxas. If the PTX is going to be
JIT'ed by the driver it won't be around. Loading an empty cubin will result in
a fatal error.
PiperOrigin-RevId: 206341931
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When running with multiple devices, using the wrong context will lead to
a check-fail when trying to set a stream that has been created with a different
context.
This resolves a check-fail on resnet50 with 8 GPUs.
PiperOrigin-RevId: 206274741
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This code hs been here since 2014, now the oldest supported version of CUDA is
8 so cuGetErrorName should always be available. Also the list of errors is
(of course) out of sync with upstream CUDA.
Also surface the description of the error to the user, if available.
PiperOrigin-RevId: 206191424
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Streams have a monotonic state machine; if a stream encounters any
error, it will remain in an error state forever. Without this change,
a previously failed sub-stream will be put back on sub_streams_, only
to cause the next usage of the sub-stream to trivially fail.
PiperOrigin-RevId: 206112024
|
| |
| |
| |
| | |
PiperOrigin-RevId: 205885304
|
| |
| |
| |
| |
| |
| | |
This will be used in a future CL.
PiperOrigin-RevId: 205742731
|