| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The core of the change is have the gradient tape capture
distributed variables instead of plain ResourceVariables.
In other words, we move the distribution awareness from defun
down to tape and rely on distributed variable magic to provide us
with the right variable at runtime.
In tower context, we always watch the container (e.g. MirroredVariable).
In cross tower context, we always watch all the components.
PiperOrigin-RevId: 216430530
|
|
|
|
|
|
|
|
|
| |
GradientTape.
For more complex use cases this allows fine grained control over what is tracked
by the tape.
PiperOrigin-RevId: 211948236
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows fine grained control over recording in some cases, for example the
following where we want d2y but not d2z:
x1 = tf.Variable(2.0, trainable=False)
x2 = tf.Variable(2.0, trainable=False)
with tf.GradientTape() as tape1:
with tf.GradientTape() as tape2:
tape1.watch(x1)
tape2.watch([x1, x2])
y = x1 ** 3
z = x2 ** 2
dy, dz = tape2.gradient([y, z], [x1, x2])
d2y, d2z = tape1.gradient([dy, dz], [x1, x2])
assert d2z is None
PiperOrigin-RevId: 211206506
|
|
|
|
| |
PiperOrigin-RevId: 196995160
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The set of tapes needs to be global to enable multithreaded programming
(when it's natural for tensors to cross threads during reduction operations)
but each thread still needs to be able to locally pause recording while
it does gradient-related bookkeeping (like custom gradients or initialization).
Also removes a mutex from the thread-local structure since it's unnecessary
as we're always holding the GIL while calling across the python-c boundary
unless we explicitly release it.
PiperOrigin-RevId: 181246570
|
|
|
|
|
|
|
|
|
|
| |
Added two simple tests for persistent tapes and did a manual test
that calling "del" on gradient tape releases all tensors.
Also:
- Add missing Py_DECREF to error case in MakeTensorIDList
- Make a couple error messages more descriptive
PiperOrigin-RevId: 176718477
|
|
|
|
| |
PiperOrigin-RevId: 175704617
|
|
|
|
| |
PiperOrigin-RevId: 175531148
|
|
|
|
| |
PiperOrigin-RevId: 175346269
|
|
|
|
|
|
| |
Neutral-to-positive on all benchmarks. Also reduces overhead of should_record.
PiperOrigin-RevId: 175057104
|
|
|
|
|
|
| |
The tape stack is still in python as is the backprop code.
PiperOrigin-RevId: 172151189
|
|
|
|
|
|
| |
While at it, clean up some dead code/comments in tape.py
PiperOrigin-RevId: 172143125
|
|
|
|
|
|
| |
They belong better in future function objects (simplifies tape move to C)
PiperOrigin-RevId: 171603665
|
|
|
|
| |
PiperOrigin-RevId: 170617321
|
|
|
|
| |
PiperOrigin-RevId: 169195496
|
|
|
|
| |
PiperOrigin-RevId: 168864350
|
|
|
|
| |
PiperOrigin-RevId: 168782341
|
|
|
|
| |
PiperOrigin-RevId: 168448557
|
|
|
|
|
|
|
|
| |
tape.watch_variable() replaces tape.watch() and now is called on ResourceVariable objects instead of their underlying handles.
implicit_grad() now returns a list of (gradient, variable) pairs to be consistent with tf.Optimizer's interface.
PiperOrigin-RevId: 168232055
|
|
|
|
| |
PiperOrigin-RevId: 167074863
|
|
|
|
|
|
| |
their outputs.
PiperOrigin-RevId: 167042517
|
|
|
|
|
|
| |
Improper wrapping and unwrapping of tensors lead to tracing being dropped.
PiperOrigin-RevId: 166910119
|
|
|
|
| |
PiperOrigin-RevId: 165772481
|
|
|
|
|
|
| |
To this end, also adds a property, `device`, to TensorNode.
PiperOrigin-RevId: 165726368
|
|
|
|
|
|
| |
in subsequent CLs.
PiperOrigin-RevId: 165632053
|
|
|
|
|
|
|
| |
As part of this change make HloAliasAnalysis a thinner layer which
basically only holds a map from HloValue to HloBuffer and vice versa.
PiperOrigin-RevId: 164923041
|
|
|
|
|
|
|
|
|
|
|
|
| |
164923041 by meheff:
Make HloAliasAnalysis updatable after changes to the HLO graph.
As part of this change make HloAliasAnalysis a thinner layer which
basically only holds a map from HloValue to HloBuffer and vice versa.
--
PiperOrigin-RevId: 164923041
|
|
PiperOrigin-RevId: 164902588
|