tensorflow - machine learning framework

	Commit message (Collapse)	Author	Age
*	Don't CHECK-fail on malformed graphs in deadness analysis	Sanjoy Das	2018-10-04
\| \| \| \| \| \| \| \| \| \|	Instead return a friendlier failed Status from the following two methods which used to CHECK-fail before: GetIncomingPreds, FindUniqueBackedge. While at it, also rename GetIncomingPreds to GetInputPreds to be consistent with the variable names. PiperOrigin-RevId: 215758757
*	Automated rollback of commit f22037abf5a6f4581f5fb6013f72f91747f22965	A. Unique TensorFlower	2018-10-04
\| \| \| \|	PiperOrigin-RevId: 215757701
*	Move out-params to end of argument list and add an out_ prefix; NFC	Sanjoy Das	2018-10-03
\| \| \| \|	PiperOrigin-RevId: 215624875
*	Add a hint parameter to TransferLiteralToDeviceAsync that the implementation ↵	A. Unique TensorFlower	2018-10-02
\| \| \| \| \| \|	can use to accelerate transfers. PiperOrigin-RevId: 215362667
*	[XLA] Migrate from gtl::FlatSet to absl::flat_hash_set	Benjamin Kramer	2018-10-01
\| \| \| \|	PiperOrigin-RevId: 215324035
*	[TF/XLA] Optimize `Encapsulator::GetFunctionNameAttr()`.	Derek Murray	2018-10-01
\| \| \| \| \| \|	The previous version was hitting a very slow path in `GetNodeAttr()`, which is expensive when the named attr is not found. This change inlines the logic of finding the two relevant attrs inside `GetFunctionNameAttr()` and avoids constructing a status object with a serialized `NodeDef` when the attr can't be found. PiperOrigin-RevId: 215298411
*	Clean up the build_xla_ops to use the generated C++ TF op wrappers.	Sanjoy Das	2018-10-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This cleanup will make the future CL implementing lazy compilation simpler. Includes some supporting changes: - Teach NewInternalScope to create a scope that doesn't do shape inference. We need this because we don't have a ShapeRefiner that has been run over the entire graph available in the build_xla_ops pass. - Add a WithAssignedDevice modifier to tensorflow::Scope. - Make cc_op_gen write out an Operation field for nodes which may not necessarily have any outputs. We already did this in most cases, but we weren't doing it for nodes that have possibly-empty list outputs. - Minor change renaming ops/xla_jit_op.cc to ops/xla_jit_ops.cc, now that we have more than one XLA JIT op. PiperOrigin-RevId: 215293817
*	[XLA] Migrate from gtl::FlatMap to absl::flat_hash_map	Benjamin Kramer	2018-10-01
\| \| \| \|	PiperOrigin-RevId: 215272497
*	Bugfix: When a subgraph is encapsulated and replaced by XlaLaunch op, the ↵	A. Unique TensorFlower	2018-10-01
\| \| \| \| \| \|	requested device placement of the XlaLaunch op must be derived from the subgraph. PiperOrigin-RevId: 215239672
*	[TF:XLA] Teach deadness analysis more of distributive property.	A. Unique TensorFlower	2018-10-01
\| \| \| \|	PiperOrigin-RevId: 215183847
*	Don't use tensorflow::Edge after freeing it	Sanjoy Das	2018-09-27
\| \| \| \| \| \| \| \| \| \| \|	Even with this bug we were accidentally doing the right thing (so the test case doesn't actually fail without the fix): deleting an Edge sets its input and output indices to kControlSlot-1 so we'd normally expect to fail when there is a control edge out of the TF cluster (because a control edge would be recognized as a data edge). But AddEdge(x, -1, y, -1) seems to do the right thing for both control and data edges. PiperOrigin-RevId: 214831204
*	[TF] Add new internal ops _VarHandlesOp and _ReadVariablesOp.	Peter Hawkins	2018-09-26
\| \| \| \| \| \| \| \|	The purpose of these ops is to fix a latency problem observed for an inference benchmark. Often a inference step starts by reading the value of many (hundreds) of weights. For a resource variable, this requires a VarHandleOp and a ReadVariableOp per variable. Running hundreds of trivial ops can add hundreds of microseconds of latency to the critical path of an inference step. The inter-op latency of the executor can be hundreds of nanoseconds, which rapidly adds up. This change introduces two fused ops _VarHandlesOp and _ReadVariablesOp that allow us to read many variables in a pair of larger ops, rather than many tiny ops. PiperOrigin-RevId: 214662338
*	Fix memory leaks of Var objects in the XlaCompileOnDemandOp and ↵	Derek Murray	2018-09-25
\| \| \| \| \| \|	SnapshotResourceVariables function. PiperOrigin-RevId: 214488033
*	Remove the "constants" input group from _XlaRun; NFC	Sanjoy Das	2018-09-24
\| \| \| \| \| \|	It wasn't actually needed. PiperOrigin-RevId: 214346217
*	Inline kernel tracing logic into `ExecutorState::Process()`.	Derek Murray	2018-09-24
\| \| \| \| \| \|	All devices implement the same tracing logic in an override of `Device::Compute()`. However, that logic does not have access to the cached `NodeItem::kernel_is_expensive` bit for the kernel, so it must make a virtual call to `OpKernel::IsExpensive()`. By inlining the logic into `ExecutorState::Process()`, we avoid making an unnecessary virtual call on each kernel invocation (when a trace controller is attached). PiperOrigin-RevId: 214332492
*	[XLA] Dump the original, unclustered graph with --tf_xla_clustering_debug.	Thomas Joerg	2018-09-21
\| \| \| \| \| \|	So far, just the clustered graph is dumped. PiperOrigin-RevId: 213994376
*	Split XlaLaunch into XlaCompile and XlaRun; NFC	Sanjoy Das	2018-09-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This CL splits the functionality in XlaLaunch into two separate operations: - XlaCompile, responsible for compiling a TF function into a LocalExecutable - XlaRun, responsible for executing a LocalExecutable created by XlaCompile This CL is a stepping stone towards implementing lazy compilation for TF/XLA. The XlaCompile op is spec'ed to return a boolean indicating whether the compilation was successful. Right now that boolean is always set to true by XlaCompile and its value is otherwise ignored, but in the future it will be used to indicate whether the TF function was compiled or not, and thus whether we should execute XlaRun or just directly call the TF function. XlaLaunch still exists, and will be created by create_xla_launch_op.cc. In the future we may consider removing it altogether. build_xla_launch_ops.cc, now renamed to build_xla_ops.cc, creates a XlaCompile/XlaRun pair instead of XlaLaunch. This CL is organized as follows: - jit/ops/xla_ops.cc gets two new XLA-specific operations, XlaCompile and XlaRun, described above. XlaRun redundantly takes the must-be-constant inputs to the TensorFlow cluster to keep the implementation simple (simple in the sense of similar to XlaLaunch), but I will remove this in a subsequent cleanup CL. - jit/kernels/xla_ops.cc implements XlaCompile and XlaRun in a fairly straightforward manner. XlaCompile compiles the TF function, puts it in a process-global storage, XlaExecutableClosureStore, and produces a int64 key. XlaRun uses the key to read out the LocalExecutable and execute it. I'm not sure if XlaExecutableClosureStore should be a resource like XlaCompilationCache; I did not immediately see any reason to make it so. - There are changes to the various _device files to register XlaCompile and XlaRun for the XLA_* devices. - Finally, I had to fix some tests that were expecting XlaLaunch in the execution timeline. PiperOrigin-RevId: 213895405
*	[XLA:TF] Whitelist quantized types for CPU/GPU	Benjamin Kramer	2018-09-20
\| \| \| \| \| \| \| \| \| \| \| \|	These have the same behavior as unquantized types so we can just pass them through to XLA (which converts them to unquantized types). They're supposed to be used with special ops, none of which are currently implemented by XLA. Casting (without quantization) and basic math works fine though. These do not have a corresponding numpy type, so only tests using TF types will see them. PiperOrigin-RevId: 213781650
*	Internal change.	A. Unique TensorFlower	2018-09-20
\| \| \| \|	PiperOrigin-RevId: 213770000
*	Return error message with illegal input rather than check-failing in op_kernel.	Jacques Pienaar	2018-09-19
\| \| \| \|	PiperOrigin-RevId: 213653853
*	Add xla.compile(), a low-level API that compiles graph with XLA.	Yanan Cao	2018-09-18
\| \| \| \|	PiperOrigin-RevId: 213574904
*	"Isolate" must-be-constant side effecting operations	Sanjoy Das	2018-09-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I first tried to fix this issue in cr/209996730 but didn't quite fix the problem for for XLA_* devices. A node assigned to an XLA_* device must be compiled so the cr/209996730 fix of simply not compiling the nodes doesn't generalize to XLA_* devices. Instead we now "isolate" these nodes, only putting them in a trivial one-node cluster. For non-XLA devices even this trivial cluster is ignored because of flags->tf_xla_min_cluster_size. I was initially considering a more principled data-flow-analysis based solution but then decided the upfront work isn't worth it until I see a clear motivating example. PiperOrigin-RevId: 213531437
*	Register FakeResourceUpdateOp for the right op	Sanjoy Das	2018-09-18
\| \| \| \| \| \| \|	Before this CL the PartiallyDeclusterPassTest.DontDuplicateResourceVarOps test was buggy, in that it wasn't testing what it was supposed to test. PiperOrigin-RevId: 213501558
*	[XLA:TF] Enable int8 and uint8 support in the bridge for CPU/GPU	Benjamin Kramer	2018-09-17
\| \| \| \| \| \| \| \| \| \|	The test changes are awkward. None of these are XLA bugs, it's just that the op definitions in tensorflow are really inconsistent. I tried to infer whether the limitation is on signed types, index types or just arbitrary. In the latter case just int8/uint8 is blacklisted, we should probably lift that requirement at some point. PiperOrigin-RevId: 213243906
*	Introduce gmock matchers for TensorFlow nodes	Sanjoy Das	2018-09-16
\| \| \| \| \| \| \| \| \| \| \| \|	I need these to write readable unit tests for TF graph transformations. All of my use cases will live inside tensorflow/compiler so putting it in tensorflow/compiler/jit for now; but we can move these out if other users are interested. In the future we may want to auto-generate type safe versions of these from the op registrations like we generate C++ wrappers today. PiperOrigin-RevId: 213186810
*	Automated rollback of commit ac60b46e2c5962fd8099a4406c1788d826ad3c0d	A. Unique TensorFlower	2018-09-13
\| \| \| \|	PiperOrigin-RevId: 212896336
*	Roll forward change "Move control flow functionalization as a graph ↵	Tong Shen	2018-09-12
\| \| \| \| \| \|	optimization pass, instead of a step in XlaCompiler.". PiperOrigin-RevId: 212657932
*	Automated rollback of commit 45965cfd8b54fb113275ffdaced5366e28aa3553	Yanan Cao	2018-09-11
\| \| \| \|	PiperOrigin-RevId: 212465918
*	Graph optimization pass that creates XlaLaunch ops for the computations that ↵	A. Unique TensorFlower	2018-09-11
\| \| \| \| \| \|	have been explicitly marked to be compiled via xla.compile() PiperOrigin-RevId: 212407112
*	Bug fix: consult graph's op registry to look up ops.	A. Unique TensorFlower	2018-09-10
\| \| \| \| \| \|	This is needed when the graph contains custom call ops. These functions are found only in the graph's registry and not the default one. PiperOrigin-RevId: 212297305
*	Automated rollback of commit d6f107761459dfdf8773a148e11193a3512a51a6	A. Unique TensorFlower	2018-09-10
\| \| \| \|	PiperOrigin-RevId: 212289067
*	Automated rollback of commit a3776a234f555213aafcf41f49a42a8a9448c4ac	Tong Shen	2018-09-09
\| \| \| \|	PiperOrigin-RevId: 212182923
*	Move control flow functionalization as a graph optimization pass, instead of ↵	Tong Shen	2018-09-09
\| \| \| \| \| \|	a step in XlaCompiler. PiperOrigin-RevId: 212164482
*	Decluster some must-be-constant ops to reduce XLA recompilations	Sanjoy Das	2018-09-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The CL is organized as follows: - The main change is in jit/partially_decluster_pass. - tf2xla/const_analysis now takes an "edge_filter" to facilitate use by jit/partially_decluster_pass. - tests/dense_layer_test.py was using the execution of ListDiff as what I assume is a sanity check to see that the XLA cluster ran. With this CL the ListDiff op gets declustered so we now check for "MatMult" for the sanity check. - Some tests were dropping TF_XLA_FLAGS; fixed them to not do so. PiperOrigin-RevId: 212071118
*	Delete parallel_check_op_flags.	Sanjoy Das	2018-09-07
\| \| \| \|	PiperOrigin-RevId: 212002568
*	Automated rollback of commit 24787842adfefe35f5a520313d775b14c29f143a	A. Unique TensorFlower	2018-09-06
\| \| \| \|	PiperOrigin-RevId: 211895566
*	[tf.data] Move all C++ code inside the `tensorflow::data` namespace.	Derek Murray	2018-09-05
\| \| \| \|	PiperOrigin-RevId: 211733735
*	[XLA] Make tensorflow/compiler use absl::{StrCat,string_view,InlinedVector} ↵	Benjamin Kramer	2018-09-05
\| \| \| \| \| \| \| \|	consistently StringPiece is an alias for absl::string_view, InlinedVector is aliased to absl::InlinedVector. StrCat is compatible, so swapping it out is safe. PiperOrigin-RevId: 211691840
*	Make logging less verbose	Sanjoy Das	2018-09-05
\| \| \| \| \| \| \|	I want --vmodule=xla_compilation_cache=1 to print only the most essential things. PiperOrigin-RevId: 211676846
*	[XLA] Rename all (Mutable)ArraySlice to absl::Span.	Tim Shen	2018-08-30
\| \| \| \|	PiperOrigin-RevId: 210998142
*	Fix a race condition in XlaLocalLaunchBase.	Sanjoy Das	2018-08-29
\| \| \| \| \| \| \| \| \| \| \|	XlaLocalLaunchBase was modifying platform_id_ without a lock which is racy because the same OpKernel can be execute concurrently. Fix this by inferring platform_id_ in the kernel constructor. While at it, make use_multiple_streams_ and xla_device_metadata_ member variables also. PiperOrigin-RevId: 210751494
*	Print cross-cluster edges; NFC	Sanjoy Das	2018-08-27
\| \| \| \|	PiperOrigin-RevId: 210467779
*	Support returning resource handles from function in XLA	Igor Ganichev	2018-08-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are a couple of reasons to do this: - resource handle are regular tensors part of a public API that can potentially be returned from a function. - When tfe.defun is executed under GradientTape, it generates a function returning resource handles in certain cases. This CL adds support for returning resource handles from an XLA compiled function. These resource handles must have been passed as arguments to the function. In other words, we don't yet support returning resources created inside the function. tfe.defun never makes functions that create resources. PiperOrigin-RevId: 210442856
*	[XLA] Unify spelling of 'fusible'	Benjamin Kramer	2018-08-27
\| \| \| \| \| \|	Of {fusable, fusile, fusible} my dictionary only knows about fusible. PiperOrigin-RevId: 210373347
*	Reduce the overhead of invoking an OpKernel when tracing is not enabled.	Derek Murray	2018-08-26
\| \| \| \|	PiperOrigin-RevId: 210317627
*	Make compile_options a mandatory const-ref argument.	Sanjoy Das	2018-08-24
\| \| \| \|	PiperOrigin-RevId: 210130976
*	Memory ordering is enforced by streams, we don't need to block TensorFlow's ↵	A. Unique TensorFlower	2018-08-24
\| \| \| \| \| \|	executor until transfers from host to device are complete. PiperOrigin-RevId: 210098914
*	[tf2xla] Re-organize information about resource ops in one place; NFC	Sanjoy Das	2018-08-23
\| \| \| \| \| \| \| \| \|	This is a cleanup on cr/208763036. Instead of spreading information about resource ops between jit/mark_for_compilation_pass and jit/resource_operation_safety_analysis we now have tf2xla/resource_operation_table own it. PiperOrigin-RevId: 210044178
*	Directly import tensor.proto.h (the transitive import will be removed from ↵	Eugene Brevdo	2018-08-23
\| \| \| \| \| \| \| \| \| \|	tensor.h soon) We plan to remove the import variant.h from tensor.h; and variant.h brings in a lot of transitive imports (including protos like tensor.proto.h). To prepare, we're updating folks who this will break. PiperOrigin-RevId: 210043667
*	Use absl::make_unique in tf2xla.	Justin Lebar	2018-08-23
\| \| \| \|	PiperOrigin-RevId: 210042392