| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
| |
* bias_nd is set to have CUDNN_DATA_FLOAT, even though BiasType is not float.
* double is supported but not exposed through the public interface.
* DoFusedConvolveImpl has duplicated information in its template parameter list.
PiperOrigin-RevId: 213308435
|
|
|
|
|
|
|
|
| |
error state
HostCallbacks may trigger notifications that, if elided, would cause programs to hang. Ideally we would have errback semantics, but this is a band-aid while the semantics are redefined.
PiperOrigin-RevId: 209818770
|
|
|
|
|
|
| |
This check-fail was wrong anyway; it meant to check the *substream's* status, but checked its own anyway. We could be in an error state and that's absolutely fine, we shouldn't kill the process for this.
PiperOrigin-RevId: 209721359
|
|
|
|
| |
PiperOrigin-RevId: 207714420
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old code ensured that failed sub-streams would not be re-used, but
had two flaws:
1) It only checked for failed sub-streams during Return.
2) It didn't actually remove the failed sub-streams from our state.
The new code fixes these two flaws, and adds an extra test that
explains why (1) is insufficient.
PiperOrigin-RevId: 207333296
|
|
|
|
|
|
|
|
|
|
| |
This is mostly a huge amount of plumbing just to call into the cublas functions.
blasGemmStridedBatched has been available since CUDA 8.0.
For autotuning we'd need cublasGemmStridedBatchedEx, which is new in CUDA 9.2
so I didn't wire that up yet.
PiperOrigin-RevId: 207285707
|
|
|
|
| |
PiperOrigin-RevId: 206595861
|
|
|
|
|
|
|
|
|
| |
Streams have a monotonic state machine; if a stream encounters any
error, it will remain in an error state forever. Without this change,
a previously failed sub-stream will be put back on sub_streams_, only
to cause the next usage of the sub-stream to trivially fail.
PiperOrigin-RevId: 206112024
|
|
|
|
| |
PiperOrigin-RevId: 205885304
|
|\
| |
| |
| |
| |
| | |
ROCmSoftwarePlatform:upstream-staging-stream-executor-pooling-interface
PiperOrigin-RevId: 204805678
|
| |
| |
| |
| | |
PiperOrigin-RevId: 204327453
|
| |
| |
| |
| | |
PiperOrigin-RevId: 204326206
|
|/
|
|
|
|
| |
Due to the design of MIOpen, the DNN library on ROCm platform, an instance of
ScratchAllocator has to be passed into pooling routines. This commit address
such interface changes and the implementation in CUDA StreamExecutor.
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of having one stream for compute, host-to-device and device-to-host transfers, switch to having separate streams, just like the GPU does.
Add a se::Event field to XlaTensor to allow accurate inter-stream dependencies to be created.
As part of this:
- Fix TransferManager::TransferLiteralFrom/ToDevice to correctly make generated substreams wait on their master stream.
- Fix Stream::BlockHostUntilDone() to not block on or return substreams. This behavior is completely broken and not only nondeterministically returns substreams to the pool but causes indefinite hangs with the HostStream.
PiperOrigin-RevId: 203726543
|
|
|
|
|
|
|
|
| |
Whilst the switch statement covers all possible enum values, the compiler still complains that it reaches the end of the function without returning a value.
I add an "unknown" string, mirroring the one in the function just above.
PiperOrigin-RevId: 200452885
|
|
|
|
|
|
|
| |
Revert #18413. Too many internal test failures due to the name scope change caused by this change.
Revert #18192. Cannot use re2::StringPiece internally. Need alternative for set call. Will pull and clean this up in a separate change.
PiperOrigin-RevId: 197991247
|
|
|
|
|
|
|
|
| |
Updates all the construction calls for Status, ScopedActivateContext and
mutexes withing stream_executor to follow the recommendation in
https://abseil.io/tips/88
PiperOrigin-RevId: 196007931
|
|
|
|
|
|
|
| |
This is in preparation of allowing XLA to fuse (A dot b) * alpha where alpha
can be on device instead of just a constant.
PiperOrigin-RevId: 194068597
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Step 1 of re-namespace'ing StreamExecutor into ::stream_executor.
This moves everything inside of stream_executor/..., and leaves a
namespace alias into ::perftools::gputools. The next steps will clean
up users to use the new namespace.
This is mostly a mechanical change, but it also includes a bunch of
non-mechanical changes that ideally would be split out into separate
patches. Unfortunately they all sort of need to be shoved in here for
various reasons:
- forward declarations need to be in the same namespace as the actual
types, so we need to change all forward declarations of
StreamExecutor types in this one patch.
- Uses of these forward declarations need to be changed to the new
namespace (or otherwise we need to add a namespace alias to the
relevant header, but this is pretty ugly).
- Various initialization code needs to live in StreamExecutor's "real"
namespace, so all this needs to be changed.
PiperOrigin-RevId: 193256128
|
|
|
|
|
|
| |
This change hasn't applied autotune on TF Cudnn kernels, only provides lower level support.
PiperOrigin-RevId: 191919566
|
|
|
|
| |
PiperOrigin-RevId: 189954596
|
|
|
|
|
|
|
|
|
|
|
| |
ThenDoHostCallback.
The reason this came about is: ThenDoHostCallback was once private, and
ThenDoHostCallbackForTest was public. Then at some point
ThenDoHostCallback became public, but the *ForTest one was never
removed.
PiperOrigin-RevId: 188459741
|
|
|
|
| |
PiperOrigin-RevId: 188025477
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This parameter is not the variance of the data, but rather is
1/(sqrt(variance + epsilon). Neglecting epsilon, this is the inverse
standard deviation.
"inv_stddev" might be a better name, but "inv_var" is certainly better
than plain "variance", and it matches nvidia's name for this parameter,
which I think may override the desire for a more precise name.
No functional change.
PiperOrigin-RevId: 179352839
|
|
|
|
| |
PiperOrigin-RevId: 179213341
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The now-deprecated Stream::BlockHostUntilDoneWithStatus remains, to facilitate a
multi-CL renaming transition. Once all callers have been renamed to
BlockHostUntilDone, *WithStatus will be removed.
The StreamExecutor (private) method has also been renamed to BlockHostUntilDone.
It's only used by Stream.
The StreamExecutorInterface method will be renamed in a separate atomic CL.
It's harder to perform that transition gradually, and we've already performed an
atomic change previously, so we might as well fix it up in one shot.
PiperOrigin-RevId: 178907807
|
|
|
|
|
|
|
|
|
| |
Also fixed a deadlock in Stream::BlockHostUntilDone. The problem with the
original code was that it grabbed mu_ before looping over substreams, and would
call CheckError with mu_ still held. But CheckError will attempt to lock mu_ in
the failure case, which would deadlock.
PiperOrigin-RevId: 178191634
|
|
|
|
|
|
| |
Relax CudnnRNNTestCompatibleRNNCells test error tolerance a bit.
PiperOrigin-RevId: 174495089
|
|
|
|
|
|
| |
Scale, offset, mean, and variance must still be float32 if the input is float16.
PiperOrigin-RevId: 170239448
|
|
|
|
|
|
| |
and support side_input and scaling parameters in float and int8 versions.
PiperOrigin-RevId: 167763219
|
|
|
|
| |
PiperOrigin-RevId: 166305887
|
|
|
|
|
|
| |
and support side_input and scaling parameters in float and int8 versions.
PiperOrigin-RevId: 166276461
|
|
|
|
| |
PiperOrigin-RevId: 162748507
|
|
|
|
| |
PiperOrigin-RevId: 162437318
|
|
|
|
|
|
| |
Currently it is turned off by default.
PiperOrigin-RevId: 162423171
|
|
|
|
| |
PiperOrigin-RevId: 162412879
|
|
|
|
|
|
| |
to stream_executor.
PiperOrigin-RevId: 161137741
|
|
|
|
| |
PiperOrigin-RevId: 160292762
|
|
|
|
| |
PiperOrigin-RevId: 159578357
|
|
|
|
|
|
|
|
|
|
|
| |
o Add new DataType kInt8
o Add new DataLayout kBatchDepthYX4 and new FilterLayout kOutputInputYX4, both
of which map to CUDNN_TENSOR_NCHW_VECT_C
o Change (Then|Do)TransformTensor to take input and output element types.
o Add new tensor format FORMAT_NCHW_VECT_C, and change the utility functions in
tensor_format.h to work with the new format.
PiperOrigin-RevId: 158806412
|
|
|
|
| |
PiperOrigin-RevId: 158142102
|
|
|
|
| |
PiperOrigin-RevId: 158062553
|
|
|
|
| |
PiperOrigin-RevId: 157788425
|
|
|
|
| |
Change: 152200430
|
|
|
|
| |
Change: 149800363
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cublas 8 adds the cublasGemmEx function, which lets you specify an
explicit "algorithm" for the computation. This functions as an opaque
tuning hint to cublas.
This patch adds support for cublasGemmEx to StreamExecutor, and wires up
XLA's GemmThunk to use the new function.
This patch does not add GEMM autotuning support in TensorFlow proper,
only XLA.
Change: 149068961
|
|
|
|
| |
Change: 149063035
|
|
|
|
|
| |
This is useful for platform-dependent functionality.
Change: 145432435
|
|
|
|
| |
Change: 144996696
|
|
|
|
|
| |
No implementations are yet provided for these operations.
Change: 144743665
|