aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/contrib/distribute/python/values.py
Commit message (Collapse)AuthorAge
* In TPUMirroredVariable, when setting _initializer_op and _initial_value ↵Gravatar Ruoxin Sang2018-10-09
| | | | | | attributes, set the attributes of all the contained variables. This fixes a bug that tf.train.init_from_checkpoint doesn't overwrite the initialization values correctly for TPUMirroredVariable. PiperOrigin-RevId: 216429476
* Add 'device' property to TPUMirroredVariable, so ↵Gravatar Ruoxin Sang2018-10-04
| | | | | | tf.train.init_from_checkpoint can be supported. PiperOrigin-RevId: 215843249
* Change semantics of DistributionStrategy.update() to make sure theGravatar A. Unique TensorFlower2018-10-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | output depends on the updates across all mirrors. Before this change, update() would return a Mirrored value that where each component was an update to a single mirror. This caused a problem since for reading purposes other DistributionStrategy methods would consider it okay to read any single component, and so if you for example did something like session.run(strategy.update(...)) it would only perform the update on one replica. The fix is to have the output be a Mirrored value that is actually the identity operation returning the output on that device, but that has a control dependency making sure that the update actually happens on all the replicas. This fix was already present in MirroredVariable._assign_func, this CL moves the fix into update() and generalizes it to multiple return values. To disable this new grouping behavior, you may now pass "grouped=False" to update(). For example, some callers (like Optimizer) are performing a lot of updates and they prefer to group all of them together at once for performance reasons. In this case, we still want to make sure the caller executes the update on all replicas, so we return an unwrapped value instead of a Mirrored value. This has the happy side effect of removing a bunch of unwrap calls in client code, since unwrapping was the only safe way to use the Mirrored value we used to return. PiperOrigin-RevId: 215301909
* Move TPU variables to the TPU device in TPUStrategy.Gravatar Jonathan Hseu2018-09-28
| | | | PiperOrigin-RevId: 215027511
* Automated rollback of commit 7f1d70d97f543d69a9f02cd6df0964f22f9278f3Gravatar Rohan Jain2018-09-28
| | | | PiperOrigin-RevId: 214989908
* Disable auto_shard for MirroredStrategy by default.Gravatar Yuefeng Zhou2018-09-28
| | | | | | We will re-enable it when it is more robust. PiperOrigin-RevId: 214956066
* Allowing source_device to be set to /cpu:0 for multi device iterator in ↵Gravatar Rohan Jain2018-09-27
| | | | | | | | distribution strategies. That is always the appropriate option. In the existing code, we would set it to a partially specified "worker" name that was ambiguous and end up on the GPU. PiperOrigin-RevId: 214882658
* Switching Distribution strategies to use MultiDeviceIterator. Currently only ↵Gravatar Rohan Jain2018-09-25
| | | | | | | | supported in Graph mode using initializable iterators. In a subsequent change, we'll add in support for Eager mode as well. This removes prefetching_ops_v2 code. PiperOrigin-RevId: 214546754
* Drop unused `_mirrored_container` property of variables that areGravatar A. Unique TensorFlower2018-08-31
| | | | | | | | | components of a MirroredVariable. We switched to using `_distributed_container` set in the parent class `DistributedVariable`, but the code setting `_mirrored_container` was accidentally added back as a result of a merge. PiperOrigin-RevId: 211111147
* Add new aggregation mode "ONLY_FIRST_TOWER" and use it for the globalGravatar A. Unique TensorFlower2018-08-29
| | | | | | | step counter. This allows us to get rid of the increment_var() function and just use a standard assign_add(). PiperOrigin-RevId: 210743165
* Fix error when getting optimizer variables with distribution strategyGravatar Pavithra Vijay2018-08-24
| | | | | | - add `_in_graph_mode` property to DistributedVariable PiperOrigin-RevId: 210177702
* Make sure all assignments to a mirrored variable happen. Failure modeGravatar A. Unique TensorFlower2018-08-24
| | | | | | | | | being fixed is when you session.run(assignment) and assignment is the MirroredVariable value returned by ResourceVariable.assign*, only one of the components of assignment is executed. Now that it is safer, allow session.run() on Mirrored values (not just MirroredVariables). PiperOrigin-RevId: 210149461
* For ParameterServerStrategy, make sure to include the AggregatingVariableGravatar A. Unique TensorFlower2018-08-24
| | | | | | wrapper for variables in collections instead of what it wraps. PiperOrigin-RevId: 210107528
* Correctly use the aggregation mode set for variables inGravatar A. Unique TensorFlower2018-08-20
| | | | | | | | ParameterServerStrategy when using >1 device per machine. This means wrapping the variable instances returned in that case in a class that intercepts assign_*() method calls. PiperOrigin-RevId: 209533673
* 1. Move distribution strategy context utility methods to a separate file ↵Gravatar Priya Gupta2018-08-14
| | | | | | | | | with few dependencies. This allows us to import this in some places without creating circular dependencies as the original file imported many things. 2. Move the stack used in distribution strategy context to the graph. This allows us to use different strategies in different graphs (for e.g. in train and eval). This fixes #21412 and #21180. PiperOrigin-RevId: 208680454
* Add an API to distribution strategy that allows running N steps. Implement ↵Gravatar Priya Gupta2018-08-08
| | | | | | this for MirroredStrategy and OneDeviceStrategy. Implemented in TPUStrategy earlier. PiperOrigin-RevId: 207961939
* Resolve distributed variables captured by defun at call timeGravatar Igor Ganichev2018-08-06
| | | | | | | | | | | | | | | | | | | | | | Before this change, when was function is called in a distribution strategy context, it would capture the component variables from some device and always use these variables, even when the function is executed on a different device. This CL "reevaluates" distributed variables to get the correct variable at call time. These correct variables are then passed to the function. We don't handle distributed tensors. First, because the mechanics for handling distributed tensors are different from handling distributed variables, their support added significant complexity to already complex defuns. Second, there is no easy way for users have a function capture a distributed tensor or feed a distributed tensor explicitly. If this changes, we can support them (the code exists in this CL's history). We also don't handle distributed variables explicitly passed into the function for similar reasons. PiperOrigin-RevId: 207640908
* Support distribution strategies in `Estimator.evaluate`.Gravatar Priya Gupta2018-07-31
| | | | PiperOrigin-RevId: 206864512
* Restore tower local variables correctly in init_from_checkpoint.Gravatar Priya Gupta2018-07-26
| | | | PiperOrigin-RevId: 206208637
* Add support for `is_tensor_like` property to DistributedValues and add ↵Gravatar Anjali Sridhar2018-07-22
| | | | | | support for calling `assign` on TowerLocalVariables. PiperOrigin-RevId: 205595323
* Refactor properties and functions common to Mirrored and TowerLocal Variables.Gravatar Anjali Sridhar2018-07-20
| | | | PiperOrigin-RevId: 205424692
* Add support for MirroredVariables in init_from_checkpoint and warm_start in ↵Gravatar Priya Gupta2018-07-17
| | | | | | estimator. PiperOrigin-RevId: 205030626
* Allow is_initialized and initializer to be called on MirroredVariables and ↵Gravatar Anjali Sridhar2018-07-06
| | | | | | TowerLocalVariables. PiperOrigin-RevId: 203520287
* Add `synchronization` and `aggregation` args to get_variable(). These args ↵Gravatar Pavithra Vijay2018-06-29
| | | | | | | | | | | | | | will be used for distributed variables. Add Enum `VariableSynchronization` with values for `synchronization`: AUTO, UNREPLICATED, ON_WRITE, ON_READ Add Enum `VariableAggregation` with values for `aggregation`: NONE, SUM, MEAN. Replace all the aggregation methods strings in distribution strategy to the enum values. Update Mirrored strategy to use these parameters to decide on whether a variable should be Mirrored or TowerLocal. Update different distribution strategy value types to use the `VariableAggregation` Enum PiperOrigin-RevId: 202736077
* Add an output context that can be used to specify outputs to capture when ↵Gravatar Priya Gupta2018-06-28
| | | | | | running multiple steps at a time using the `run_steps_on_dataset` API. It allows the user's step function to specify which outputs to emit at what frequency. Currently it only supports capturing output from the last step, but will soon be augmented to support other use cases such as output each N steps. PiperOrigin-RevId: 202520245
* Enable assign, assign_add and assign_sub to be called on Mirrored Variables ↵Gravatar Anjali Sridhar2018-06-26
| | | | | | in cross tower and tower context. PiperOrigin-RevId: 202162272
* Make regroup work on tower-local variables as well.Gravatar A. Unique TensorFlower2018-06-21
| | | | PiperOrigin-RevId: 201554738
* Update mnist eager example with mirrored strategy as some of the methods it ↵Gravatar Priya Gupta2018-06-20
| | | | | | was using are now deprecated. PiperOrigin-RevId: 201478331
* Switch away from DistributionStrategy.fetch() (mostly just in tests)Gravatar A. Unique TensorFlower2018-06-20
| | | | | | | | | | | | | | | | | | so we can delete it. Frequently we can now delete the call entirely, but in other cases we switch to read_var(). This revealed some bugs also fixed in this CL: * For MirroredStrategy: fix read_var(mean_tower_local) bug. * Support get() for Mirrored values that are not MirroredVariables, and make them DistributedDelegates so we can operate on them in cross-tower mode. * Actually iterate through the available devices in MirroredStrategy.get(). With this and already-submitted 201390698, we can pass mirrored variables and other mirrored values directly to self.evaluate() in tests. PiperOrigin-RevId: 201435436
* Make ops.colocate_with work with tower-local variables as well.Gravatar Yuefeng Zhou2018-06-13
| | | | PiperOrigin-RevId: 200467472
* Resolve device names when passed into DistributionStrategy methods.Gravatar A. Unique TensorFlower2018-06-04
| | | | PiperOrigin-RevId: 199241723
* Checkpointable: move python/training/checkpointable_* to ↵Gravatar Allen Lavoie2018-05-16
| | | | | | | | | | python/training/checkpointable/ Need to add some new checkpointable files in core (specifically I had some checkpointable data structures in mind), and prefixing more files with "checkpointable_" in python/training/ seems dirty. No functional changes, just some branching and build/import fiddling. PiperOrigin-RevId: 196883136
* Fixes for accessing variables with a MirroredStrategy in aGravatar A. Unique TensorFlower2018-05-07
| | | | | | | | | | | | | | | cross-tower context: * only provide read-only access to variables via get() * don't fail if use the variable isn't copied to the current device in get() * make _as_graph_element() return the aggregate value for tower-local variables (instead of the incorrect previous behavior of returning the primary) PiperOrigin-RevId: 195711474
* Generalize the input to TPU distribution strategy. Add cross-shard-replica sum.Gravatar Igor Saprykin2018-05-07
| | | | | | TPUStrategy passes tests in minimize_loss_test. That caused me to add a capability to have `iterations x cores` inputs of any structure. I also resolved a big number of small issues and uncovered more things to resolve that are documented as todos. PiperOrigin-RevId: 195696833
* Use experimental auto_sharding in multi worker dataset.Gravatar Priya Gupta2018-05-02
| | | | PiperOrigin-RevId: 195092992
* Add MultiNodeDataset and MultiNodeIterator which are intended to work for ↵Gravatar Yuefeng Zhou2018-04-30
| | | | | | multi-node distribution strategy. PiperOrigin-RevId: 194862215
* When a mirrored variable is fetched in cross-tower mode, fetch its primary ↵Gravatar Igor Saprykin2018-04-30
| | | | | | | | | | | variable. This prevents errors like ValueError: Fetch argument MirroredVariable({'/job:localhost/replica:0/task:0/device:GPU:0': <tf.Variable 'global_step:0' shape=() dtype=int64>, '/job:localhost/replica:0/task:0/device:GPU:1': <tf.Variable 'global_step/replica_1:0' shape=() dtype=int64>}) cannot be interpreted as a Tensor. (Device /job:localhost/replica:0/task:0/device:CPU:0 not found in ['/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1'] (current device )) I ran distribute/examples/resnet with and without the change and it fixed the problem. PiperOrigin-RevId: 194828672
* Support variable parameter structure in TPU distribution strategy.Gravatar Igor Saprykin2018-04-19
| | | | | | | | TPUStrategy is added to a few more tests. There appears to be an issue with the batch norm test in minimize_loss_test where the moving averages stay at 0. I'm trying to resolve that separately as the next CL. PiperOrigin-RevId: 193610264
* Support various shapes in TPU DistributionStrategy.Gravatar Igor Saprykin2018-04-19
| | | | PiperOrigin-RevId: 193563912
* Add support for initializable iterator in distribution strategies. Use that ↵Gravatar Priya Gupta2018-04-18
| | | | | | in estimator. PiperOrigin-RevId: 193394603
* Merge changes from github.Gravatar Scott Zhu2018-04-13
| | | | PiperOrigin-RevId: 192850372
* Add basic serialization support to DistributedVariable (by using the ↵Gravatar Priya Gupta2018-03-29
| | | | | | underlying primary variable's serialization). Also, throw an exception when trying to de-serialize as we haven't implemented that yet. PiperOrigin-RevId: 191022884
* Internal change.Gravatar Igor Saprykin2018-03-29
| | | | PiperOrigin-RevId: 191020351
* Add tf.contrib.distribute, which defines classes DistributionStrategyGravatar A. Unique TensorFlower2018-03-29
and MirroredStrategy, and related functionality. Also add tf.contrib.optimizer_v2, an update to the Optimizer API. RELNOTES: Can now pass tf.contrib.distribute.MirroredStrategy() to tf.estimator.RunConfig() to run an Estimator model on multiple GPUs on one machine. PiperOrigin-RevId: 190996247