aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/contrib/distribute/python/mirrored_strategy_multigpu_test.py
Commit message (Collapse)AuthorAge
* Make defun work under distributed strategies.Gravatar Igor Ganichev2018-10-09
| | | | | | | | | | | | | The core of the change is have the gradient tape capture distributed variables instead of plain ResourceVariables. In other words, we move the distribution awareness from defun down to tape and rely on distributed variable magic to provide us with the right variable at runtime. In tower context, we always watch the container (e.g. MirroredVariable). In cross tower context, we always watch all the components. PiperOrigin-RevId: 216430530
* Change semantics of DistributionStrategy.update() to make sure theGravatar A. Unique TensorFlower2018-10-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | output depends on the updates across all mirrors. Before this change, update() would return a Mirrored value that where each component was an update to a single mirror. This caused a problem since for reading purposes other DistributionStrategy methods would consider it okay to read any single component, and so if you for example did something like session.run(strategy.update(...)) it would only perform the update on one replica. The fix is to have the output be a Mirrored value that is actually the identity operation returning the output on that device, but that has a control dependency making sure that the update actually happens on all the replicas. This fix was already present in MirroredVariable._assign_func, this CL moves the fix into update() and generalizes it to multiple return values. To disable this new grouping behavior, you may now pass "grouped=False" to update(). For example, some callers (like Optimizer) are performing a lot of updates and they prefer to group all of them together at once for performance reasons. In this case, we still want to make sure the caller executes the update on all replicas, so we return an unwrapped value instead of a Mirrored value. This has the happy side effect of removing a bunch of unwrap calls in client code, since unwrapping was the only safe way to use the Mirrored value we used to return. PiperOrigin-RevId: 215301909
* Automated rollback of commit 7f1d70d97f543d69a9f02cd6df0964f22f9278f3Gravatar Rohan Jain2018-09-28
| | | | PiperOrigin-RevId: 214989908
* Switching Distribution strategies to use MultiDeviceIterator. Currently only ↵Gravatar Rohan Jain2018-09-25
| | | | | | | | supported in Graph mode using initializable iterators. In a subsequent change, we'll add in support for Eager mode as well. This removes prefetching_ops_v2 code. PiperOrigin-RevId: 214546754
* Keep only weak references to variables in graph functionsGravatar Allen Lavoie2018-09-17
| | | | | | | | This enables cleanup of the variables referenced in defunned methods of objects when the object is garbage collected. Since one PolymorphicFunction is created per @defun, decorated methods before this change held on to all of the variables referenced in that method for any instance of the class (i.e. variables which should have been object-scoped were scoped to the lifetime of the class definition). Raises an exception if variables used in the function have been deleted when it is called, which means no local variables. PiperOrigin-RevId: 213337256
* Add `num_gpus_per_worker` argument to MirroredStrategy.Gravatar Yuefeng Zhou2018-08-30
| | | | PiperOrigin-RevId: 211008923
* Add new aggregation mode "ONLY_FIRST_TOWER" and use it for the globalGravatar A. Unique TensorFlower2018-08-29
| | | | | | | step counter. This allows us to get rid of the increment_var() function and just use a standard assign_add(). PiperOrigin-RevId: 210743165
* Reduce learning_rate multi-worker MirroredStrategy test.Gravatar Yuefeng Zhou2018-08-24
| | | | PiperOrigin-RevId: 210165808
* Make sure all assignments to a mirrored variable happen. Failure modeGravatar A. Unique TensorFlower2018-08-24
| | | | | | | | | being fixed is when you session.run(assignment) and assignment is the MirroredVariable value returned by ResourceVariable.assign*, only one of the components of assignment is executed. Now that it is safer, allow session.run() on Mirrored values (not just MirroredVariables). PiperOrigin-RevId: 210149461
* Make sure all distribution strategies work with clusters with a chief node.Gravatar Yuefeng Zhou2018-08-24
| | | | PiperOrigin-RevId: 210100001
* Implemented the configure method and properties needed by distribute ↵Gravatar Yuefeng Zhou2018-08-22
| | | | | | coordinator in MirroredStrategy. PiperOrigin-RevId: 209848375
* 1. Move distribution strategy context utility methods to a separate file ↵Gravatar Priya Gupta2018-08-14
| | | | | | | | | with few dependencies. This allows us to import this in some places without creating circular dependencies as the original file imported many things. 2. Move the stack used in distribution strategy context to the graph. This allows us to use different strategies in different graphs (for e.g. in train and eval). This fixes #21412 and #21180. PiperOrigin-RevId: 208680454
* Add comments to MirroredStrategy's reduce function. Also add more unit tests.Gravatar Anjali Sridhar2018-08-06
| | | | PiperOrigin-RevId: 207649000
* Resolve distributed variables captured by defun at call timeGravatar Igor Ganichev2018-08-06
| | | | | | | | | | | | | | | | | | | | | | Before this change, when was function is called in a distribution strategy context, it would capture the component variables from some device and always use these variables, even when the function is executed on a different device. This CL "reevaluates" distributed variables to get the correct variable at call time. These correct variables are then passed to the function. We don't handle distributed tensors. First, because the mechanics for handling distributed tensors are different from handling distributed variables, their support added significant complexity to already complex defuns. Second, there is no easy way for users have a function capture a distributed tensor or feed a distributed tensor explicitly. If this changes, we can support them (the code exists in this CL's history). We also don't handle distributed variables explicitly passed into the function for similar reasons. PiperOrigin-RevId: 207640908
* Add parameter server distribution.Gravatar Yuefeng Zhou2018-07-27
| | | | PiperOrigin-RevId: 206289143
* Add support for `is_tensor_like` property to DistributedValues and add ↵Gravatar Anjali Sridhar2018-07-22
| | | | | | support for calling `assign` on TowerLocalVariables. PiperOrigin-RevId: 205595323
* Add `synchronization` and `aggregation` args to the layer `add_weight()` ↵Gravatar Pavithra Vijay2018-07-09
| | | | | | | | API. These args will be used for distributed variables. Migrate all usages of `tower_local_var_scope` to using the new args. PiperOrigin-RevId: 203855963
* Allow is_initialized and initializer to be called on MirroredVariables and ↵Gravatar Anjali Sridhar2018-07-06
| | | | | | TowerLocalVariables. PiperOrigin-RevId: 203520287
* Small fixes in VariableSynchrinization and VariableAggregation change.Gravatar Pavithra Vijay2018-07-02
| | | | PiperOrigin-RevId: 202983273
* Add `synchronization` and `aggregation` args to get_variable(). These args ↵Gravatar Pavithra Vijay2018-06-29
| | | | | | | | | | | | | | will be used for distributed variables. Add Enum `VariableSynchronization` with values for `synchronization`: AUTO, UNREPLICATED, ON_WRITE, ON_READ Add Enum `VariableAggregation` with values for `aggregation`: NONE, SUM, MEAN. Replace all the aggregation methods strings in distribution strategy to the enum values. Update Mirrored strategy to use these parameters to decide on whether a variable should be Mirrored or TowerLocal. Update different distribution strategy value types to use the `VariableAggregation` Enum PiperOrigin-RevId: 202736077
* Enable assign, assign_add and assign_sub to be called on Mirrored Variables ↵Gravatar Anjali Sridhar2018-06-26
| | | | | | in cross tower and tower context. PiperOrigin-RevId: 202162272
* Replace unnecessary `()` in `run_in_graph_and_eager_modes()`.Gravatar Tom Hennigan2018-06-22
| | | | PiperOrigin-RevId: 201652888
* Make regroup work on tower-local variables as well.Gravatar A. Unique TensorFlower2018-06-21
| | | | PiperOrigin-RevId: 201554738
* Switch away from DistributionStrategy.fetch() (mostly just in tests)Gravatar A. Unique TensorFlower2018-06-20
| | | | | | | | | | | | | | | | | | so we can delete it. Frequently we can now delete the call entirely, but in other cases we switch to read_var(). This revealed some bugs also fixed in this CL: * For MirroredStrategy: fix read_var(mean_tower_local) bug. * Support get() for Mirrored values that are not MirroredVariables, and make them DistributedDelegates so we can operate on them in cross-tower mode. * Actually iterate through the available devices in MirroredStrategy.get(). With this and already-submitted 201390698, we can pass mirrored variables and other mirrored values directly to self.evaluate() in tests. PiperOrigin-RevId: 201435436
* Allow TowerLocalVars to be updated with the same value across all towers.Gravatar Anjali Sridhar2018-06-20
| | | | PiperOrigin-RevId: 201379124
* Respect name scopes opened in tower mode when creating vars in cross tower mode.Gravatar Anjali Sridhar2018-06-05
| | | | PiperOrigin-RevId: 199319758
* Remove _USE_C_API staging in tests now that the C API is enabled by default.Gravatar Skye Wanderman-Milne2018-05-16
| | | | | | This is in preparation for removing the _USE_C_API toggle altogether. PiperOrigin-RevId: 196920890
* Handle delayed variable initialization in MirroredStrategy. Test with RNN layer.Gravatar Priya Gupta2018-05-15
| | | | | | Bug reported and solution suggested in #19069 PiperOrigin-RevId: 196718454
* Fixes for accessing variables with a MirroredStrategy in aGravatar A. Unique TensorFlower2018-05-07
| | | | | | | | | | | | | | | cross-tower context: * only provide read-only access to variables via get() * don't fail if use the variable isn't copied to the current device in get() * make _as_graph_element() return the aggregate value for tower-local variables (instead of the incorrect previous behavior of returning the primary) PiperOrigin-RevId: 195711474
* Change distribution.distribute_dataset to accept an input_fn instead of a ↵Gravatar Yuefeng Zhou2018-04-18
| | | | | | dataset. PiperOrigin-RevId: 193437651
* Add support for initializable iterator in distribution strategies. Use that ↵Gravatar Priya Gupta2018-04-18
| | | | | | in estimator. PiperOrigin-RevId: 193394603
* Add tf.contrib.distribute, which defines classes DistributionStrategyGravatar A. Unique TensorFlower2018-03-29
and MirroredStrategy, and related functionality. Also add tf.contrib.optimizer_v2, an update to the Optimizer API. RELNOTES: Can now pass tf.contrib.distribute.MirroredStrategy() to tf.estimator.RunConfig() to run an Estimator model on multiple GPUs on one machine. PiperOrigin-RevId: 190996247