| Commit message (Collapse) | Author | Age |
... | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
PiperOrigin-RevId: 211519628
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
PiperOrigin-RevId: 211519250
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
* simplify contraction by collapsing inner dims into single dimension
* get rid of expensive reverse op
~5X improvement when compiled with AVX.
PiperOrigin-RevId: 211518363
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Fixes #21266
PiperOrigin-RevId: 211515918
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
PiperOrigin-RevId: 211514287
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
PiperOrigin-RevId: 211514002
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
PiperOrigin-RevId: 211510051
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
PiperOrigin-RevId: 211505721
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
PiperOrigin-RevId: 211505612
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | | |
PiperOrigin-RevId: 211502883
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
PiperOrigin-RevId: 211501909
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
PiperOrigin-RevId: 211500190
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
PiperOrigin-RevId: 211498364
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
PiperOrigin-RevId: 211496364
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
PiperOrigin-RevId: 211496283
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
monotonic unary functions.
Add the ability to flip Max <-> Min if the function is non-increasing, e.g. Max(Neg(x)) => Neg(Min(x)).
PiperOrigin-RevId: 211490436
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
HLO transformations would forget to propagate the feature depth attribute.
Making these attributes mandatory, while slightly less convenient for tests,
makes HLO transformations more robust.
PiperOrigin-RevId: 211490160
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
PiperOrigin-RevId: 211489741
|
|\ \ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
wdirons:21833_disable_gpu_test_scatter_add_ndim_op_test
PiperOrigin-RevId: 211489137
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
PiperOrigin-RevId: 211488610
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
PiperOrigin-RevId: 211487989
|
|\ \ \ \ \ \ \ \
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
PiperOrigin-RevId: 211480449
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
metrics combined with while loops.
PiperOrigin-RevId: 211479604
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
get an owned device mgr from the input session.
One use case is in S4TF, we run a graph session to enqueue a tensor into a fifo
queue, and then call TFE_Execute() on a dequeue op over the same queue, as a way
to transfer a tensor from TF to host (tensor tranfer in the other direction also
works).
To make this work, we need TFE_Context and the the TF_Session to use the same
ResourceMgr object (attached to a Device, which is in turn owned by DeviceMgr),
so that both can access the fifo queue resource op.
PiperOrigin-RevId: 211471075
|
|\ \ \ \ \ \ \ \ \
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
PiperOrigin-RevId: 211469413
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
PiperOrigin-RevId: 211459453
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
transposed matrix.
PiperOrigin-RevId: 211453816
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
PiperOrigin-RevId: 211450476
|
| | | | | |\ \ \ \ \
| |_|_|_|_|/ / / / /
|/| | | | | | | | | |
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
This improves build times when using the downloaded clang toolchain.
Additionally, remove '-B/usr/bin' flags from the cuda CROSSTOOL when using
the downloaded toolchain.
It was forcing 'clang' to first search for the linker in '/usr/bin',
preventing downloaded LLD from being selected.
PiperOrigin-RevId: 211430374
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
PiperOrigin-RevId: 211423600
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
asynchronously
By the time memcpy actually executes the source on stack would've gone out of
scope. Also drop an unneeded BlockHostUntilDone from the autotune code.
PiperOrigin-RevId: 211422876
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
This introduces a connection between forward and backward cells across subsequent layers when stacking bidirectional RNN Ops on top of each other.
In more detail:
Previously, the Op had only one input that was fed into the layer in the
following way:
INPUT (INPUT_REVERSED)
| |
---------------------
| FW_RNN BW_RNN | <----- bidi-RNN cell (with one input / two outpus)
---------------------
| |
FW_OUT BW_OUT
Now, the Op can have an (optional) auxiliary input in the following way:
AUX_INPUT (AUX_INPUT_REVERSED)
| |
INPUT | (INPUT_R'D.)|
| | | |
-----------------------
| \ / \ / |
| FW_RNN BW_RNN | <----- bidi-RNN cell (with 2 inputs / 2 outpus)
-----------------------
| |
FW_OUT BW_OUT
When stacking these Ops, previously, only the following flow was allowed:
Input
/ \
FW_RNN1 BW_RNN1
| |
| |
FW_RNN2 BW RNN2
| |
| |
FW_RNN3 BW_RNN3
\ /
Output
With the introduction of an auxiliary input to the bidi-RNN layer, the forward
(FW_RNNi) output of the ith layer is fed into as the input to the next layer
(hence, inputs to both FW_RNN{i+1} and BW_RNN{i+1}) and the backward output is
fed as the auxiliary inputs to both FW_RNN{i+1} and BW_RNN{i+1}). This way, the
stacking can be changed to allow for the "cross-linking" between subsequent
layer in the following way:
Input
/ \
FW_RNN1 BW_RNN1
| \ / |
| / \ |
FW_RNN2 BW RNN2
| \ / |
| / \ |
FW_RNN3 BW_RNN3
\ /
Output
PiperOrigin-RevId: 211401475
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
PiperOrigin-RevId: 211387503
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
PiperOrigin-RevId: 211378182
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
PiperOrigin-RevId: 211378028
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
PiperOrigin-RevId: 211377977
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
The autotune code assumes a clean slate, but there might be things from
previous program executions still pending on the streams owned by the executor.
Do a full host-device sync before autotuning to flush out any pending work.
I'm still somewhat confused on how autotune can interfere with other buffers.
There might be more things going wrong ...
PiperOrigin-RevId: 211369162
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
We use the same trick that is used in the TPU backend.
PiperOrigin-RevId: 211344106
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Cudnn supports grouped convolutions, so we don't need the
ConvolutionFeatureGroupConverter pass and can instead set the group_count
parameter on the cudnn custom calls.
PiperOrigin-RevId: 211339551
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
PiperOrigin-RevId: 211339000
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
PiperOrigin-RevId: 211323840
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Reinstate the use of integral-exponent power function MathUtil::IPow, but make sure to use a floating point base, so as to compute the result using floating point arithmetic. This behaviour is equivalent to, but faster than, std::pow.
Note that care must be taken to convert the base to double, which we effect by providing an explicit template type argument for MathUtil::IPow.
PiperOrigin-RevId: 211290304
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Happened to observe this come up in a linear algebra workload.
PiperOrigin-RevId: 211290278
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
PiperOrigin-RevId: 211257009
|
|\ \ \ \ \ \ \ \ \ \
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
PiperOrigin-RevId: 211226585
|
| |/ / / / / / / / /
|/| | | | | | | | | |
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
This allows fine grained control over recording in some cases, for example the
following where we want d2y but not d2z:
x1 = tf.Variable(2.0, trainable=False)
x2 = tf.Variable(2.0, trainable=False)
with tf.GradientTape() as tape1:
with tf.GradientTape() as tape2:
tape1.watch(x1)
tape2.watch([x1, x2])
y = x1 ** 3
z = x2 ** 2
dy, dz = tape2.gradient([y, z], [x1, x2])
d2y, d2z = tape1.gradient([dy, dz], [x1, x2])
assert d2z is None
PiperOrigin-RevId: 211206506
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
PiperOrigin-RevId: 211204708
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
StringPiece and string_view are the same now, no need to convert between them.
PiperOrigin-RevId: 211195959
|