aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/contrib/distribute/python/mirrored_strategy.py
Commit message (Collapse)AuthorAge
* Make defun work under distributed strategies.Gravatar Igor Ganichev2018-10-09
| | | | | | | | | | | | | The core of the change is have the gradient tape capture distributed variables instead of plain ResourceVariables. In other words, we move the distribution awareness from defun down to tape and rely on distributed variable magic to provide us with the right variable at runtime. In tower context, we always watch the container (e.g. MirroredVariable). In cross tower context, we always watch all the components. PiperOrigin-RevId: 216430530
* Merge pull request #22591 from EFanZh:fix-docsGravatar TensorFlower Gardener2018-10-03
|\ | | | | | | PiperOrigin-RevId: 215639962
* | Change semantics of DistributionStrategy.update() to make sure theGravatar A. Unique TensorFlower2018-10-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | output depends on the updates across all mirrors. Before this change, update() would return a Mirrored value that where each component was an update to a single mirror. This caused a problem since for reading purposes other DistributionStrategy methods would consider it okay to read any single component, and so if you for example did something like session.run(strategy.update(...)) it would only perform the update on one replica. The fix is to have the output be a Mirrored value that is actually the identity operation returning the output on that device, but that has a control dependency making sure that the update actually happens on all the replicas. This fix was already present in MirroredVariable._assign_func, this CL moves the fix into update() and generalizes it to multiple return values. To disable this new grouping behavior, you may now pass "grouped=False" to update(). For example, some callers (like Optimizer) are performing a lot of updates and they prefer to group all of them together at once for performance reasons. In this case, we still want to make sure the caller executes the update on all replicas, so we return an unwrapped value instead of a Mirrored value. This has the happy side effect of removing a bunch of unwrap calls in client code, since unwrapping was the only safe way to use the Mirrored value we used to return. PiperOrigin-RevId: 215301909
* | Automated rollback of commit 7f1d70d97f543d69a9f02cd6df0964f22f9278f3Gravatar Rohan Jain2018-09-28
| | | | | | | | PiperOrigin-RevId: 214989908
* | Disable auto_shard for MirroredStrategy by default.Gravatar Yuefeng Zhou2018-09-28
| | | | | | | | | | | | We will re-enable it when it is more robust. PiperOrigin-RevId: 214956066
| * Fix some documentation errorsGravatar EFanZh2018-09-28
|/
* Allowing source_device to be set to /cpu:0 for multi device iterator in ↵Gravatar Rohan Jain2018-09-27
| | | | | | | | distribution strategies. That is always the appropriate option. In the existing code, we would set it to a partially specified "worker" name that was ambiguous and end up on the GPU. PiperOrigin-RevId: 214882658
* Switching Distribution strategies to use MultiDeviceIterator. Currently only ↵Gravatar Rohan Jain2018-09-25
| | | | | | | | supported in Graph mode using initializable iterators. In a subsequent change, we'll add in support for Eager mode as well. This removes prefetching_ops_v2 code. PiperOrigin-RevId: 214546754
* Set session_config.isolate_session_state to True for all strategies except ↵Gravatar Priya Gupta2018-09-04
| | | | | | Parameter server strategy where variables are shared across sessions. PiperOrigin-RevId: 211573447
* Add `num_gpus_per_worker` argument to MirroredStrategy.Gravatar Yuefeng Zhou2018-08-30
| | | | PiperOrigin-RevId: 211008923
* Add new aggregation mode "ONLY_FIRST_TOWER" and use it for the globalGravatar A. Unique TensorFlower2018-08-29
| | | | | | | step counter. This allows us to get rid of the increment_var() function and just use a standard assign_add(). PiperOrigin-RevId: 210743165
* Use nccl if there is only one worker in multi-worker MirroredStrategies.Gravatar Yuefeng Zhou2018-08-28
| | | | PiperOrigin-RevId: 210669284
* For ParameterServerStrategy, make sure to include the AggregatingVariableGravatar A. Unique TensorFlower2018-08-24
| | | | | | wrapper for variables in collections instead of what it wraps. PiperOrigin-RevId: 210107528
* Implemented the configure method and properties needed by distribute ↵Gravatar Yuefeng Zhou2018-08-22
| | | | | | coordinator in MirroredStrategy. PiperOrigin-RevId: 209848375
* Merge MultiWorkerMirroredStrategy into MirroredStrategyGravatar Yuefeng Zhou2018-08-16
| | | | PiperOrigin-RevId: 209099475
* Make tf.metrics work with TPU Strategy.Gravatar Priya Gupta2018-08-16
| | | | PiperOrigin-RevId: 209064406
* Step_fn should be able to receive un-wrapped inputsGravatar Sourabh Bajaj2018-08-15
| | | | PiperOrigin-RevId: 208929959
* Merge branch 'master' into masterGravatar Jan Hünnemeyer2018-08-09
|\
| * Add an API to distribution strategy that allows running N steps. Implement ↵Gravatar Priya Gupta2018-08-08
| | | | | | | | | | | | this for MirroredStrategy and OneDeviceStrategy. Implemented in TPUStrategy earlier. PiperOrigin-RevId: 207961939
| * Add comments to MirroredStrategy's reduce function. Also add more unit tests.Gravatar Anjali Sridhar2018-08-06
| | | | | | | | PiperOrigin-RevId: 207649000
| * Resolve distributed variables captured by defun at call timeGravatar Igor Ganichev2018-08-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this change, when was function is called in a distribution strategy context, it would capture the component variables from some device and always use these variables, even when the function is executed on a different device. This CL "reevaluates" distributed variables to get the correct variable at call time. These correct variables are then passed to the function. We don't handle distributed tensors. First, because the mechanics for handling distributed tensors are different from handling distributed variables, their support added significant complexity to already complex defuns. Second, there is no easy way for users have a function capture a distributed tensor or feed a distributed tensor explicitly. If this changes, we can support them (the code exists in this CL's history). We also don't handle distributed variables explicitly passed into the function for similar reasons. PiperOrigin-RevId: 207640908
| * Add the CollectiveAllReduceStrategy.Gravatar Yuefeng Zhou2018-08-03
| | | | | | | | PiperOrigin-RevId: 207348195
| * Automated rollback of commit 493d7588172bcf476309b3954db342839ca37872Gravatar Akshay Agrawal2018-08-03
| | | | | | | | PiperOrigin-RevId: 207294037
| * Add the CollectiveAllReduceStrategy.Gravatar Yuefeng Zhou2018-08-02
| | | | | | | | PiperOrigin-RevId: 207215423
| * Add parameter server distribution.Gravatar Yuefeng Zhou2018-07-27
| | | | | | | | PiperOrigin-RevId: 206289143
* | pylint: whitespace changesGravatar Jan Horst Hünnemeyer2018-07-17
| |
| * Small fixes in VariableSynchrinization and VariableAggregation change.Gravatar Pavithra Vijay2018-07-02
| | | | | | | | PiperOrigin-RevId: 202983273
| * Add `synchronization` and `aggregation` args to get_variable(). These args ↵Gravatar Pavithra Vijay2018-06-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | will be used for distributed variables. Add Enum `VariableSynchronization` with values for `synchronization`: AUTO, UNREPLICATED, ON_WRITE, ON_READ Add Enum `VariableAggregation` with values for `aggregation`: NONE, SUM, MEAN. Replace all the aggregation methods strings in distribution strategy to the enum values. Update Mirrored strategy to use these parameters to decide on whether a variable should be Mirrored or TowerLocal. Update different distribution strategy value types to use the `VariableAggregation` Enum PiperOrigin-RevId: 202736077
* | Utilizing dict comprehensionGravatar Jan Horst Hünnemeyer2018-06-29
|/
* Enable assign, assign_add and assign_sub to be called on Mirrored Variables ↵Gravatar Anjali Sridhar2018-06-26
| | | | | | in cross tower and tower context. PiperOrigin-RevId: 202162272
* Delete DistributionStrategy.fetch() now that it is unused.Gravatar A. Unique TensorFlower2018-06-21
| | | | PiperOrigin-RevId: 201582230
* Switch away from DistributionStrategy.fetch() (mostly just in tests)Gravatar A. Unique TensorFlower2018-06-20
| | | | | | | | | | | | | | | | | | so we can delete it. Frequently we can now delete the call entirely, but in other cases we switch to read_var(). This revealed some bugs also fixed in this CL: * For MirroredStrategy: fix read_var(mean_tower_local) bug. * Support get() for Mirrored values that are not MirroredVariables, and make them DistributedDelegates so we can operate on them in cross-tower mode. * Actually iterate through the available devices in MirroredStrategy.get(). With this and already-submitted 201390698, we can pass mirrored variables and other mirrored values directly to self.evaluate() in tests. PiperOrigin-RevId: 201435436
* Allow TowerLocalVars to be updated with the same value across all towers.Gravatar Anjali Sridhar2018-06-20
| | | | PiperOrigin-RevId: 201379124
* Disable caching_device for mirrored variables.Gravatar A. Unique TensorFlower2018-06-19
| | | | PiperOrigin-RevId: 201232817
* Make the return value of `read_var` consistently a tensor instead ofGravatar A. Unique TensorFlower2018-06-12
| | | | | | sometimes a variable. PiperOrigin-RevId: 200231463
* Automated g4 rollback of changelist 197218170Gravatar A. Unique TensorFlower2018-06-12
| | | | PiperOrigin-RevId: 200209039
* Respect name scopes opened in tower mode when creating vars in cross tower mode.Gravatar Anjali Sridhar2018-06-05
| | | | PiperOrigin-RevId: 199319758
* Resolve device names when passed into DistributionStrategy methods.Gravatar A. Unique TensorFlower2018-06-04
| | | | PiperOrigin-RevId: 199241723
* Public API to switch between eager execution and graph building.Gravatar Alexandre Passos2018-05-25
| | | | | | | | | | | | | | | | | | | | | Now, after tf.enable_eager_execution() has been executed, entering the context manager of a tf.Graph will enable graph mode. So, for example ``` tf.enable_eager_execution() with tf.Graph().as_default(): c = tf.constant(1.0) # this is a graph tensor c2 = tf.constant(1.0) # this is an eager tensor ``` The main use-case of this is allowing documentation writers to make a single notebook which starts with eager execution and seamlessly transitions to building graphs. This also makes many explicit enablings of graph mode in the code redundant (a cleanup cl will follow). PiperOrigin-RevId: 198092991
* Handle delayed variable initialization in MirroredStrategy. Test with RNN layer.Gravatar Priya Gupta2018-05-15
| | | | | | Bug reported and solution suggested in #19069 PiperOrigin-RevId: 196718454
* Add the MultiWorkerMirroredStrategyGravatar Yuefeng Zhou2018-05-04
| | | | PiperOrigin-RevId: 195368876
* Add device_util.resolve method which merges with current device as well.Gravatar Yuefeng Zhou2018-05-01
| | | | PiperOrigin-RevId: 194976633
* Change distribution.distribute_dataset to accept an input_fn instead of a ↵Gravatar Yuefeng Zhou2018-04-18
| | | | | | dataset. PiperOrigin-RevId: 193437651
* Add support for initializable iterator in distribution strategies. Use that ↵Gravatar Priya Gupta2018-04-18
| | | | | | in estimator. PiperOrigin-RevId: 193394603
* Internal changeGravatar Yuefeng Zhou2018-03-29
| | | | PiperOrigin-RevId: 191024677
* Add tf.contrib.distribute, which defines classes DistributionStrategyGravatar A. Unique TensorFlower2018-03-29
and MirroredStrategy, and related functionality. Also add tf.contrib.optimizer_v2, an update to the Optimizer API. RELNOTES: Can now pass tf.contrib.distribute.MirroredStrategy() to tf.estimator.RunConfig() to run an Estimator model on multiple GPUs on one machine. PiperOrigin-RevId: 190996247