tensorflow - machine learning framework

	Commit message (Collapse)	Author	Age
*	In TPUMirroredVariable, when setting _initializer_op and _initial_value ↵	Ruoxin Sang	2018-10-09
\| \| \| \| \| \|	attributes, set the attributes of all the contained variables. This fixes a bug that tf.train.init_from_checkpoint doesn't overwrite the initialization values correctly for TPUMirroredVariable. PiperOrigin-RevId: 216429476
*	Add 'device' property to TPUMirroredVariable, so ↵	Ruoxin Sang	2018-10-04
\| \| \| \| \| \|	tf.train.init_from_checkpoint can be supported. PiperOrigin-RevId: 215843249
*	Change semantics of DistributionStrategy.update() to make sure the	A. Unique TensorFlower	2018-10-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	output depends on the updates across all mirrors. Before this change, update() would return a Mirrored value that where each component was an update to a single mirror. This caused a problem since for reading purposes other DistributionStrategy methods would consider it okay to read any single component, and so if you for example did something like session.run(strategy.update(...)) it would only perform the update on one replica. The fix is to have the output be a Mirrored value that is actually the identity operation returning the output on that device, but that has a control dependency making sure that the update actually happens on all the replicas. This fix was already present in MirroredVariable._assign_func, this CL moves the fix into update() and generalizes it to multiple return values. To disable this new grouping behavior, you may now pass "grouped=False" to update(). For example, some callers (like Optimizer) are performing a lot of updates and they prefer to group all of them together at once for performance reasons. In this case, we still want to make sure the caller executes the update on all replicas, so we return an unwrapped value instead of a Mirrored value. This has the happy side effect of removing a bunch of unwrap calls in client code, since unwrapping was the only safe way to use the Mirrored value we used to return. PiperOrigin-RevId: 215301909
*	Move TPU variables to the TPU device in TPUStrategy.	Jonathan Hseu	2018-09-28
\| \| \| \|	PiperOrigin-RevId: 215027511
*	Automated rollback of commit 7f1d70d97f543d69a9f02cd6df0964f22f9278f3	Rohan Jain	2018-09-28
\| \| \| \|	PiperOrigin-RevId: 214989908
*	Disable auto_shard for MirroredStrategy by default.	Yuefeng Zhou	2018-09-28
\| \| \| \| \| \|	We will re-enable it when it is more robust. PiperOrigin-RevId: 214956066
*	Allowing source_device to be set to /cpu:0 for multi device iterator in ↵	Rohan Jain	2018-09-27
\| \| \| \| \| \| \| \|	distribution strategies. That is always the appropriate option. In the existing code, we would set it to a partially specified "worker" name that was ambiguous and end up on the GPU. PiperOrigin-RevId: 214882658
*	Switching Distribution strategies to use MultiDeviceIterator. Currently only ↵	Rohan Jain	2018-09-25
\| \| \| \| \| \| \| \|	supported in Graph mode using initializable iterators. In a subsequent change, we'll add in support for Eager mode as well. This removes prefetching_ops_v2 code. PiperOrigin-RevId: 214546754
*	Drop unused `_mirrored_container` property of variables that are	A. Unique TensorFlower	2018-08-31
\| \| \| \| \| \| \| \| \|	components of a MirroredVariable. We switched to using `_distributed_container` set in the parent class `DistributedVariable`, but the code setting `_mirrored_container` was accidentally added back as a result of a merge. PiperOrigin-RevId: 211111147
*	Add new aggregation mode "ONLY_FIRST_TOWER" and use it for the global	A. Unique TensorFlower	2018-08-29
\| \| \| \| \| \| \|	step counter. This allows us to get rid of the increment_var() function and just use a standard assign_add(). PiperOrigin-RevId: 210743165
*	Fix error when getting optimizer variables with distribution strategy	Pavithra Vijay	2018-08-24
\| \| \| \| \| \|	- add `_in_graph_mode` property to DistributedVariable PiperOrigin-RevId: 210177702
*	Make sure all assignments to a mirrored variable happen. Failure mode	A. Unique TensorFlower	2018-08-24
\| \| \| \| \| \| \| \| \|	being fixed is when you session.run(assignment) and assignment is the MirroredVariable value returned by ResourceVariable.assign*, only one of the components of assignment is executed. Now that it is safer, allow session.run() on Mirrored values (not just MirroredVariables). PiperOrigin-RevId: 210149461
*	For ParameterServerStrategy, make sure to include the AggregatingVariable	A. Unique TensorFlower	2018-08-24
\| \| \| \| \| \|	wrapper for variables in collections instead of what it wraps. PiperOrigin-RevId: 210107528
*	Correctly use the aggregation mode set for variables in	A. Unique TensorFlower	2018-08-20
\| \| \| \| \| \| \| \|	ParameterServerStrategy when using >1 device per machine. This means wrapping the variable instances returned in that case in a class that intercepts assign_*() method calls. PiperOrigin-RevId: 209533673
*	1. Move distribution strategy context utility methods to a separate file ↵	Priya Gupta	2018-08-14
\| \| \| \| \| \| \| \| \|	with few dependencies. This allows us to import this in some places without creating circular dependencies as the original file imported many things. 2. Move the stack used in distribution strategy context to the graph. This allows us to use different strategies in different graphs (for e.g. in train and eval). This fixes #21412 and #21180. PiperOrigin-RevId: 208680454
*	Add an API to distribution strategy that allows running N steps. Implement ↵	Priya Gupta	2018-08-08
\| \| \| \| \| \|	this for MirroredStrategy and OneDeviceStrategy. Implemented in TPUStrategy earlier. PiperOrigin-RevId: 207961939
*	Resolve distributed variables captured by defun at call time	Igor Ganichev	2018-08-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this change, when was function is called in a distribution strategy context, it would capture the component variables from some device and always use these variables, even when the function is executed on a different device. This CL "reevaluates" distributed variables to get the correct variable at call time. These correct variables are then passed to the function. We don't handle distributed tensors. First, because the mechanics for handling distributed tensors are different from handling distributed variables, their support added significant complexity to already complex defuns. Second, there is no easy way for users have a function capture a distributed tensor or feed a distributed tensor explicitly. If this changes, we can support them (the code exists in this CL's history). We also don't handle distributed variables explicitly passed into the function for similar reasons. PiperOrigin-RevId: 207640908
*	Support distribution strategies in `Estimator.evaluate`.	Priya Gupta	2018-07-31
\| \| \| \|	PiperOrigin-RevId: 206864512
*	Restore tower local variables correctly in init_from_checkpoint.	Priya Gupta	2018-07-26
\| \| \| \|	PiperOrigin-RevId: 206208637
*	Add support for `is_tensor_like` property to DistributedValues and add ↵	Anjali Sridhar	2018-07-22
\| \| \| \| \| \|	support for calling `assign` on TowerLocalVariables. PiperOrigin-RevId: 205595323
*	Refactor properties and functions common to Mirrored and TowerLocal Variables.	Anjali Sridhar	2018-07-20
\| \| \| \|	PiperOrigin-RevId: 205424692
*	Add support for MirroredVariables in init_from_checkpoint and warm_start in ↵	Priya Gupta	2018-07-17
\| \| \| \| \| \|	estimator. PiperOrigin-RevId: 205030626
*	Allow is_initialized and initializer to be called on MirroredVariables and ↵	Anjali Sridhar	2018-07-06
\| \| \| \| \| \|	TowerLocalVariables. PiperOrigin-RevId: 203520287
*	Add `synchronization` and `aggregation` args to get_variable(). These args ↵	Pavithra Vijay	2018-06-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	will be used for distributed variables. Add Enum `VariableSynchronization` with values for `synchronization`: AUTO, UNREPLICATED, ON_WRITE, ON_READ Add Enum `VariableAggregation` with values for `aggregation`: NONE, SUM, MEAN. Replace all the aggregation methods strings in distribution strategy to the enum values. Update Mirrored strategy to use these parameters to decide on whether a variable should be Mirrored or TowerLocal. Update different distribution strategy value types to use the `VariableAggregation` Enum PiperOrigin-RevId: 202736077
*	Add an output context that can be used to specify outputs to capture when ↵	Priya Gupta	2018-06-28
\| \| \| \| \| \|	running multiple steps at a time using the `run_steps_on_dataset` API. It allows the user's step function to specify which outputs to emit at what frequency. Currently it only supports capturing output from the last step, but will soon be augmented to support other use cases such as output each N steps. PiperOrigin-RevId: 202520245
*	Enable assign, assign_add and assign_sub to be called on Mirrored Variables ↵	Anjali Sridhar	2018-06-26
\| \| \| \| \| \|	in cross tower and tower context. PiperOrigin-RevId: 202162272
*	Make regroup work on tower-local variables as well.	A. Unique TensorFlower	2018-06-21
\| \| \| \|	PiperOrigin-RevId: 201554738
*	Update mnist eager example with mirrored strategy as some of the methods it ↵	Priya Gupta	2018-06-20
\| \| \| \| \| \|	was using are now deprecated. PiperOrigin-RevId: 201478331
*	Switch away from DistributionStrategy.fetch() (mostly just in tests)	A. Unique TensorFlower	2018-06-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	so we can delete it. Frequently we can now delete the call entirely, but in other cases we switch to read_var(). This revealed some bugs also fixed in this CL: * For MirroredStrategy: fix read_var(mean_tower_local) bug. * Support get() for Mirrored values that are not MirroredVariables, and make them DistributedDelegates so we can operate on them in cross-tower mode. * Actually iterate through the available devices in MirroredStrategy.get(). With this and already-submitted 201390698, we can pass mirrored variables and other mirrored values directly to self.evaluate() in tests. PiperOrigin-RevId: 201435436
*	Make ops.colocate_with work with tower-local variables as well.	Yuefeng Zhou	2018-06-13
\| \| \| \|	PiperOrigin-RevId: 200467472
*	Resolve device names when passed into DistributionStrategy methods.	A. Unique TensorFlower	2018-06-04
\| \| \| \|	PiperOrigin-RevId: 199241723
*	Checkpointable: move python/training/checkpointable_* to ↵	Allen Lavoie	2018-05-16
\| \| \| \| \| \| \| \| \| \|	python/training/checkpointable/ Need to add some new checkpointable files in core (specifically I had some checkpointable data structures in mind), and prefixing more files with "checkpointable_" in python/training/ seems dirty. No functional changes, just some branching and build/import fiddling. PiperOrigin-RevId: 196883136
*	Fixes for accessing variables with a MirroredStrategy in a	A. Unique TensorFlower	2018-05-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cross-tower context: * only provide read-only access to variables via get() * don't fail if use the variable isn't copied to the current device in get() * make _as_graph_element() return the aggregate value for tower-local variables (instead of the incorrect previous behavior of returning the primary) PiperOrigin-RevId: 195711474
*	Generalize the input to TPU distribution strategy. Add cross-shard-replica sum.	Igor Saprykin	2018-05-07
\| \| \| \| \| \|	TPUStrategy passes tests in minimize_loss_test. That caused me to add a capability to have `iterations x cores` inputs of any structure. I also resolved a big number of small issues and uncovered more things to resolve that are documented as todos. PiperOrigin-RevId: 195696833
*	Use experimental auto_sharding in multi worker dataset.	Priya Gupta	2018-05-02
\| \| \| \|	PiperOrigin-RevId: 195092992
*	Add MultiNodeDataset and MultiNodeIterator which are intended to work for ↵	Yuefeng Zhou	2018-04-30
\| \| \| \| \| \|	multi-node distribution strategy. PiperOrigin-RevId: 194862215
*	When a mirrored variable is fetched in cross-tower mode, fetch its primary ↵	Igor Saprykin	2018-04-30
\| \| \| \| \| \| \| \| \| \| \|	variable. This prevents errors like ValueError: Fetch argument MirroredVariable({'/job:localhost/replica:0/task:0/device:GPU:0': <tf.Variable 'global_step:0' shape=() dtype=int64>, '/job:localhost/replica:0/task:0/device:GPU:1': <tf.Variable 'global_step/replica_1:0' shape=() dtype=int64>}) cannot be interpreted as a Tensor. (Device /job:localhost/replica:0/task:0/device:CPU:0 not found in ['/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1'] (current device )) I ran distribute/examples/resnet with and without the change and it fixed the problem. PiperOrigin-RevId: 194828672
*	Support variable parameter structure in TPU distribution strategy.	Igor Saprykin	2018-04-19
\| \| \| \| \| \| \| \|	TPUStrategy is added to a few more tests. There appears to be an issue with the batch norm test in minimize_loss_test where the moving averages stay at 0. I'm trying to resolve that separately as the next CL. PiperOrigin-RevId: 193610264
*	Support various shapes in TPU DistributionStrategy.	Igor Saprykin	2018-04-19
\| \| \| \|	PiperOrigin-RevId: 193563912
*	Add support for initializable iterator in distribution strategies. Use that ↵	Priya Gupta	2018-04-18
\| \| \| \| \| \|	in estimator. PiperOrigin-RevId: 193394603
*	Merge changes from github.	Scott Zhu	2018-04-13
\| \| \| \|	PiperOrigin-RevId: 192850372
*	Add basic serialization support to DistributedVariable (by using the ↵	Priya Gupta	2018-03-29
\| \| \| \| \| \|	underlying primary variable's serialization). Also, throw an exception when trying to de-serialize as we haven't implemented that yet. PiperOrigin-RevId: 191022884
*	Internal change.	Igor Saprykin	2018-03-29
\| \| \| \|	PiperOrigin-RevId: 191020351
*	Add tf.contrib.distribute, which defines classes DistributionStrategy	A. Unique TensorFlower	2018-03-29
	and MirroredStrategy, and related functionality. Also add tf.contrib.optimizer_v2, an update to the Optimizer API. RELNOTES: Can now pass tf.contrib.distribute.MirroredStrategy() to tf.estimator.RunConfig() to run an Estimator model on multiple GPUs on one machine. PiperOrigin-RevId: 190996247