| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The core of the change is have the gradient tape capture
distributed variables instead of plain ResourceVariables.
In other words, we move the distribution awareness from defun
down to tape and rely on distributed variable magic to provide us
with the right variable at runtime.
In tower context, we always watch the container (e.g. MirroredVariable).
In cross tower context, we always watch all the components.
PiperOrigin-RevId: 216430530
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
output depends on the updates across all mirrors. Before this change,
update() would return a Mirrored value that where each component was
an update to a single mirror. This caused a problem since for reading
purposes other DistributionStrategy methods would consider it okay
to read any single component, and so if you for example did something
like session.run(strategy.update(...)) it would only perform the
update on one replica. The fix is to have the output be a Mirrored
value that is actually the identity operation returning the output on
that device, but that has a control dependency making sure that the
update actually happens on all the replicas. This fix was already
present in MirroredVariable._assign_func, this CL moves the fix into
update() and generalizes it to multiple return values.
To disable this new grouping behavior, you may now pass
"grouped=False" to update(). For example, some callers (like Optimizer)
are performing a lot of updates and they prefer to group all of them
together at once for performance reasons. In this case, we still want
to make sure the caller executes the update on all replicas, so we
return an unwrapped value instead of a Mirrored value. This has the
happy side effect of removing a bunch of unwrap calls in client code,
since unwrapping was the only safe way to use the Mirrored value we
used to return.
PiperOrigin-RevId: 215301909
|
|
|
|
| |
PiperOrigin-RevId: 214989908
|
|
|
|
|
|
|
|
| |
supported in Graph mode using initializable iterators. In a subsequent change, we'll add in support for Eager mode as well.
This removes prefetching_ops_v2 code.
PiperOrigin-RevId: 214546754
|
|
|
|
|
|
|
|
| |
This enables cleanup of the variables referenced in defunned methods of objects when the object is garbage collected. Since one PolymorphicFunction is created per @defun, decorated methods before this change held on to all of the variables referenced in that method for any instance of the class (i.e. variables which should have been object-scoped were scoped to the lifetime of the class definition).
Raises an exception if variables used in the function have been deleted when it is called, which means no local variables.
PiperOrigin-RevId: 213337256
|
|
|
|
| |
PiperOrigin-RevId: 211008923
|
|
|
|
|
|
|
| |
step counter. This allows us to get rid of the increment_var()
function and just use a standard assign_add().
PiperOrigin-RevId: 210743165
|
|
|
|
| |
PiperOrigin-RevId: 210165808
|
|
|
|
|
|
|
|
|
| |
being fixed is when you session.run(assignment) and assignment is the
MirroredVariable value returned by ResourceVariable.assign*, only one
of the components of assignment is executed. Now that it is safer,
allow session.run() on Mirrored values (not just MirroredVariables).
PiperOrigin-RevId: 210149461
|
|
|
|
| |
PiperOrigin-RevId: 210100001
|
|
|
|
|
|
| |
coordinator in MirroredStrategy.
PiperOrigin-RevId: 209848375
|
|
|
|
|
|
|
|
|
| |
with few dependencies. This allows us to import this in some places without creating circular dependencies as the original file imported many things.
2. Move the stack used in distribution strategy context to the graph. This allows us to use different strategies in different graphs (for e.g. in train and eval).
This fixes #21412 and #21180.
PiperOrigin-RevId: 208680454
|
|
|
|
| |
PiperOrigin-RevId: 207649000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this change, when was function is called in a distribution
strategy context, it would capture the component variables from some
device and always use these variables, even when the function is
executed on a different device.
This CL "reevaluates" distributed variables to get the correct variable
at call time. These correct variables are then passed to the function.
We don't handle distributed tensors. First, because the mechanics for handling
distributed tensors are different from handling distributed variables,
their support added significant complexity to already complex defuns.
Second, there is no easy way for users have a function capture a distributed
tensor or feed a distributed tensor explicitly. If this changes, we can
support them (the code exists in this CL's history).
We also don't handle distributed variables explicitly passed into the
function for similar reasons.
PiperOrigin-RevId: 207640908
|
|
|
|
| |
PiperOrigin-RevId: 206289143
|
|
|
|
|
|
| |
support for calling `assign` on TowerLocalVariables.
PiperOrigin-RevId: 205595323
|
|
|
|
|
|
|
|
| |
API. These args will be used for distributed variables.
Migrate all usages of `tower_local_var_scope` to using the new args.
PiperOrigin-RevId: 203855963
|
|
|
|
|
|
| |
TowerLocalVariables.
PiperOrigin-RevId: 203520287
|
|
|
|
| |
PiperOrigin-RevId: 202983273
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
will be used for distributed variables.
Add Enum `VariableSynchronization` with values for `synchronization`: AUTO, UNREPLICATED, ON_WRITE, ON_READ
Add Enum `VariableAggregation` with values for `aggregation`: NONE, SUM, MEAN. Replace all the aggregation methods strings in distribution strategy to the enum values.
Update Mirrored strategy to use these parameters to decide on whether a variable should be Mirrored or TowerLocal.
Update different distribution strategy value types to use the `VariableAggregation` Enum
PiperOrigin-RevId: 202736077
|
|
|
|
|
|
| |
in cross tower and tower context.
PiperOrigin-RevId: 202162272
|
|
|
|
| |
PiperOrigin-RevId: 201652888
|
|
|
|
| |
PiperOrigin-RevId: 201554738
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
so we can delete it. Frequently we can now delete the call entirely,
but in other cases we switch to read_var().
This revealed some bugs also fixed in this CL:
* For MirroredStrategy: fix read_var(mean_tower_local) bug.
* Support get() for Mirrored values that are not MirroredVariables,
and make them DistributedDelegates so we can operate on them in
cross-tower mode.
* Actually iterate through the available devices in MirroredStrategy.get().
With this and already-submitted 201390698, we can pass mirrored
variables and other mirrored values directly to self.evaluate() in
tests.
PiperOrigin-RevId: 201435436
|
|
|
|
| |
PiperOrigin-RevId: 201379124
|
|
|
|
| |
PiperOrigin-RevId: 199319758
|
|
|
|
|
|
| |
This is in preparation for removing the _USE_C_API toggle altogether.
PiperOrigin-RevId: 196920890
|
|
|
|
|
|
| |
Bug reported and solution suggested in #19069
PiperOrigin-RevId: 196718454
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cross-tower context:
* only provide read-only access to variables via get()
* don't fail if use the variable isn't copied to the current device in
get()
* make _as_graph_element() return the aggregate value for tower-local
variables (instead of the incorrect previous behavior of returning
the primary)
PiperOrigin-RevId: 195711474
|
|
|
|
|
|
| |
dataset.
PiperOrigin-RevId: 193437651
|
|
|
|
|
|
| |
in estimator.
PiperOrigin-RevId: 193394603
|
|
and MirroredStrategy, and related functionality.
Also add tf.contrib.optimizer_v2, an update to the Optimizer API.
RELNOTES: Can now pass tf.contrib.distribute.MirroredStrategy() to
tf.estimator.RunConfig() to run an Estimator model on multiple GPUs
on one machine.
PiperOrigin-RevId: 190996247
|