aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/compiler/jit
Commit message (Collapse)AuthorAge
* Don't CHECK-fail on malformed graphs in deadness analysisGravatar Sanjoy Das2018-10-04
| | | | | | | | | | Instead return a friendlier failed Status from the following two methods which used to CHECK-fail before: GetIncomingPreds, FindUniqueBackedge. While at it, also rename GetIncomingPreds to GetInputPreds to be consistent with the variable names. PiperOrigin-RevId: 215758757
* Automated rollback of commit f22037abf5a6f4581f5fb6013f72f91747f22965Gravatar A. Unique TensorFlower2018-10-04
| | | | PiperOrigin-RevId: 215757701
* Move out-params to end of argument list and add an out_ prefix; NFCGravatar Sanjoy Das2018-10-03
| | | | PiperOrigin-RevId: 215624875
* Add a hint parameter to TransferLiteralToDeviceAsync that the implementation ↵Gravatar A. Unique TensorFlower2018-10-02
| | | | | | can use to accelerate transfers. PiperOrigin-RevId: 215362667
* [XLA] Migrate from gtl::FlatSet to absl::flat_hash_setGravatar Benjamin Kramer2018-10-01
| | | | PiperOrigin-RevId: 215324035
* [TF/XLA] Optimize `Encapsulator::GetFunctionNameAttr()`.Gravatar Derek Murray2018-10-01
| | | | | | The previous version was hitting a very slow path in `GetNodeAttr()`, which is expensive when the named attr is not found. This change inlines the logic of finding the two relevant attrs inside `GetFunctionNameAttr()` and avoids constructing a status object with a serialized `NodeDef` when the attr can't be found. PiperOrigin-RevId: 215298411
* Clean up the build_xla_ops to use the generated C++ TF op wrappers.Gravatar Sanjoy Das2018-10-01
| | | | | | | | | | | | | | | | | | | | | This cleanup will make the future CL implementing lazy compilation simpler. Includes some supporting changes: - Teach NewInternalScope to create a scope that doesn't do shape inference. We need this because we don't have a ShapeRefiner that has been run over the entire graph available in the build_xla_ops pass. - Add a WithAssignedDevice modifier to tensorflow::Scope. - Make cc_op_gen write out an Operation field for nodes which may not necessarily have any outputs. We already did this in most cases, but we weren't doing it for nodes that have possibly-empty list outputs. - Minor change renaming ops/xla_jit_op.cc to ops/xla_jit_ops.cc, now that we have more than one XLA JIT op. PiperOrigin-RevId: 215293817
* [XLA] Migrate from gtl::FlatMap to absl::flat_hash_mapGravatar Benjamin Kramer2018-10-01
| | | | PiperOrigin-RevId: 215272497
* Bugfix: When a subgraph is encapsulated and replaced by XlaLaunch op, the ↵Gravatar A. Unique TensorFlower2018-10-01
| | | | | | requested device placement of the XlaLaunch op must be derived from the subgraph. PiperOrigin-RevId: 215239672
* [TF:XLA] Teach deadness analysis more of distributive property.Gravatar A. Unique TensorFlower2018-10-01
| | | | PiperOrigin-RevId: 215183847
* Don't use tensorflow::Edge after freeing itGravatar Sanjoy Das2018-09-27
| | | | | | | | | | | Even with this bug we were accidentally doing the right thing (so the test case doesn't actually fail without the fix): deleting an Edge sets its input and output indices to kControlSlot-1 so we'd normally expect to fail when there is a control edge out of the TF cluster (because a control edge would be recognized as a data edge). But AddEdge(x, -1, y, -1) seems to do the right thing for both control and data edges. PiperOrigin-RevId: 214831204
* [TF] Add new internal ops _VarHandlesOp and _ReadVariablesOp.Gravatar Peter Hawkins2018-09-26
| | | | | | | | The purpose of these ops is to fix a latency problem observed for an inference benchmark. Often a inference step starts by reading the value of many (hundreds) of weights. For a resource variable, this requires a VarHandleOp and a ReadVariableOp per variable. Running hundreds of trivial ops can add hundreds of microseconds of latency to the critical path of an inference step. The inter-op latency of the executor can be hundreds of nanoseconds, which rapidly adds up. This change introduces two fused ops _VarHandlesOp and _ReadVariablesOp that allow us to read many variables in a pair of larger ops, rather than many tiny ops. PiperOrigin-RevId: 214662338
* Fix memory leaks of Var objects in the XlaCompileOnDemandOp and ↵Gravatar Derek Murray2018-09-25
| | | | | | SnapshotResourceVariables function. PiperOrigin-RevId: 214488033
* Remove the "constants" input group from _XlaRun; NFCGravatar Sanjoy Das2018-09-24
| | | | | | It wasn't actually needed. PiperOrigin-RevId: 214346217
* Inline kernel tracing logic into `ExecutorState::Process()`.Gravatar Derek Murray2018-09-24
| | | | | | All devices implement the same tracing logic in an override of `Device::Compute()`. However, that logic does not have access to the cached `NodeItem::kernel_is_expensive` bit for the kernel, so it must make a virtual call to `OpKernel::IsExpensive()`. By inlining the logic into `ExecutorState::Process()`, we avoid making an unnecessary virtual call on each kernel invocation (when a trace controller is attached). PiperOrigin-RevId: 214332492
* [XLA] Dump the original, unclustered graph with --tf_xla_clustering_debug.Gravatar Thomas Joerg2018-09-21
| | | | | | So far, just the clustered graph is dumped. PiperOrigin-RevId: 213994376
* Split XlaLaunch into XlaCompile and XlaRun; NFCGravatar Sanjoy Das2018-09-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This CL splits the functionality in XlaLaunch into two separate operations: - XlaCompile, responsible for compiling a TF function into a LocalExecutable - XlaRun, responsible for executing a LocalExecutable created by XlaCompile This CL is a stepping stone towards implementing lazy compilation for TF/XLA. The XlaCompile op is spec'ed to return a boolean indicating whether the compilation was successful. Right now that boolean is always set to true by XlaCompile and its value is otherwise ignored, but in the future it will be used to indicate whether the TF function was compiled or not, and thus whether we should execute XlaRun or just directly call the TF function. XlaLaunch still exists, and will be created by create_xla_launch_op.cc. In the future we may consider removing it altogether. build_xla_launch_ops.cc, now renamed to build_xla_ops.cc, creates a XlaCompile/XlaRun pair instead of XlaLaunch. This CL is organized as follows: - jit/ops/xla_ops.cc gets two new XLA-specific operations, XlaCompile and XlaRun, described above. XlaRun redundantly takes the must-be-constant inputs to the TensorFlow cluster to keep the implementation simple (simple in the sense of similar to XlaLaunch), but I will remove this in a subsequent cleanup CL. - jit/kernels/xla_ops.cc implements XlaCompile and XlaRun in a fairly straightforward manner. XlaCompile compiles the TF function, puts it in a process-global storage, XlaExecutableClosureStore, and produces a int64 key. XlaRun uses the key to read out the LocalExecutable and execute it. I'm not sure if XlaExecutableClosureStore should be a resource like XlaCompilationCache; I did not immediately see any reason to make it so. - There are changes to the various _device files to register XlaCompile and XlaRun for the XLA_* devices. - Finally, I had to fix some tests that were expecting XlaLaunch in the execution timeline. PiperOrigin-RevId: 213895405
* [XLA:TF] Whitelist quantized types for CPU/GPUGravatar Benjamin Kramer2018-09-20
| | | | | | | | | | | | These have the same behavior as unquantized types so we can just pass them through to XLA (which converts them to unquantized types). They're supposed to be used with special ops, none of which are currently implemented by XLA. Casting (without quantization) and basic math works fine though. These do not have a corresponding numpy type, so only tests using TF types will see them. PiperOrigin-RevId: 213781650
* Internal change.Gravatar A. Unique TensorFlower2018-09-20
| | | | PiperOrigin-RevId: 213770000
* Return error message with illegal input rather than check-failing in op_kernel.Gravatar Jacques Pienaar2018-09-19
| | | | PiperOrigin-RevId: 213653853
* Add xla.compile(), a low-level API that compiles graph with XLA.Gravatar Yanan Cao2018-09-18
| | | | PiperOrigin-RevId: 213574904
* "Isolate" must-be-constant side effecting operationsGravatar Sanjoy Das2018-09-18
| | | | | | | | | | | | | | | I first tried to fix this issue in cr/209996730 but didn't quite fix the problem for for XLA_* devices. A node assigned to an XLA_* device must be compiled so the cr/209996730 fix of simply not compiling the nodes doesn't generalize to XLA_* devices. Instead we now "isolate" these nodes, only putting them in a trivial one-node cluster. For non-XLA devices even this trivial cluster is ignored because of flags->tf_xla_min_cluster_size. I was initially considering a more principled data-flow-analysis based solution but then decided the upfront work isn't worth it until I see a clear motivating example. PiperOrigin-RevId: 213531437
* Register FakeResourceUpdateOp for the right opGravatar Sanjoy Das2018-09-18
| | | | | | | Before this CL the PartiallyDeclusterPassTest.DontDuplicateResourceVarOps test was buggy, in that it wasn't testing what it was supposed to test. PiperOrigin-RevId: 213501558
* [XLA:TF] Enable int8 and uint8 support in the bridge for CPU/GPUGravatar Benjamin Kramer2018-09-17
| | | | | | | | | | The test changes are awkward. None of these are XLA bugs, it's just that the op definitions in tensorflow are really inconsistent. I tried to infer whether the limitation is on signed types, index types or just arbitrary. In the latter case just int8/uint8 is blacklisted, we should probably lift that requirement at some point. PiperOrigin-RevId: 213243906
* Introduce gmock matchers for TensorFlow nodesGravatar Sanjoy Das2018-09-16
| | | | | | | | | | | | I need these to write readable unit tests for TF graph transformations. All of my use cases will live inside tensorflow/compiler so putting it in tensorflow/compiler/jit for now; but we can move these out if other users are interested. In the future we may want to auto-generate type safe versions of these from the op registrations like we generate C++ wrappers today. PiperOrigin-RevId: 213186810
* Automated rollback of commit ac60b46e2c5962fd8099a4406c1788d826ad3c0dGravatar A. Unique TensorFlower2018-09-13
| | | | PiperOrigin-RevId: 212896336
* Roll forward change "Move control flow functionalization as a graph ↵Gravatar Tong Shen2018-09-12
| | | | | | optimization pass, instead of a step in XlaCompiler.". PiperOrigin-RevId: 212657932
* Automated rollback of commit 45965cfd8b54fb113275ffdaced5366e28aa3553Gravatar Yanan Cao2018-09-11
| | | | PiperOrigin-RevId: 212465918
* Graph optimization pass that creates XlaLaunch ops for the computations that ↵Gravatar A. Unique TensorFlower2018-09-11
| | | | | | have been explicitly marked to be compiled via xla.compile() PiperOrigin-RevId: 212407112
* Bug fix: consult graph's op registry to look up ops.Gravatar A. Unique TensorFlower2018-09-10
| | | | | | This is needed when the graph contains custom call ops. These functions are found only in the graph's registry and not the default one. PiperOrigin-RevId: 212297305
* Automated rollback of commit d6f107761459dfdf8773a148e11193a3512a51a6Gravatar A. Unique TensorFlower2018-09-10
| | | | PiperOrigin-RevId: 212289067
* Automated rollback of commit a3776a234f555213aafcf41f49a42a8a9448c4acGravatar Tong Shen2018-09-09
| | | | PiperOrigin-RevId: 212182923
* Move control flow functionalization as a graph optimization pass, instead of ↵Gravatar Tong Shen2018-09-09
| | | | | | a step in XlaCompiler. PiperOrigin-RevId: 212164482
* Decluster some must-be-constant ops to reduce XLA recompilationsGravatar Sanjoy Das2018-09-07
| | | | | | | | | | | | | | | The CL is organized as follows: - The main change is in jit/partially_decluster_pass. - tf2xla/const_analysis now takes an "edge_filter" to facilitate use by jit/partially_decluster_pass. - tests/dense_layer_test.py was using the execution of ListDiff as what I assume is a sanity check to see that the XLA cluster ran. With this CL the ListDiff op gets declustered so we now check for "MatMult" for the sanity check. - Some tests were dropping TF_XLA_FLAGS; fixed them to not do so. PiperOrigin-RevId: 212071118
* Delete parallel_check_op_flags.Gravatar Sanjoy Das2018-09-07
| | | | PiperOrigin-RevId: 212002568
* Automated rollback of commit 24787842adfefe35f5a520313d775b14c29f143aGravatar A. Unique TensorFlower2018-09-06
| | | | PiperOrigin-RevId: 211895566
* [tf.data] Move all C++ code inside the `tensorflow::data` namespace.Gravatar Derek Murray2018-09-05
| | | | PiperOrigin-RevId: 211733735
* [XLA] Make tensorflow/compiler use absl::{StrCat,string_view,InlinedVector} ↵Gravatar Benjamin Kramer2018-09-05
| | | | | | | | consistently StringPiece is an alias for absl::string_view, InlinedVector is aliased to absl::InlinedVector. StrCat is compatible, so swapping it out is safe. PiperOrigin-RevId: 211691840
* Make logging less verboseGravatar Sanjoy Das2018-09-05
| | | | | | | I want --vmodule=xla_compilation_cache=1 to print only the most essential things. PiperOrigin-RevId: 211676846
* [XLA] Rename all (Mutable)ArraySlice to absl::Span.Gravatar Tim Shen2018-08-30
| | | | PiperOrigin-RevId: 210998142
* Fix a race condition in XlaLocalLaunchBase.Gravatar Sanjoy Das2018-08-29
| | | | | | | | | | | XlaLocalLaunchBase was modifying platform_id_ without a lock which is racy because the same OpKernel can be execute concurrently. Fix this by inferring platform_id_ in the kernel constructor. While at it, make use_multiple_streams_ and xla_device_metadata_ member variables also. PiperOrigin-RevId: 210751494
* Print cross-cluster edges; NFCGravatar Sanjoy Das2018-08-27
| | | | PiperOrigin-RevId: 210467779
* Support returning resource handles from function in XLAGravatar Igor Ganichev2018-08-27
| | | | | | | | | | | | | | | | There are a couple of reasons to do this: - resource handle are regular tensors part of a public API that can potentially be returned from a function. - When tfe.defun is executed under GradientTape, it generates a function returning resource handles in certain cases. This CL adds support for returning resource handles from an XLA compiled function. These resource handles must have been passed as arguments to the function. In other words, we don't yet support returning resources created inside the function. tfe.defun never makes functions that create resources. PiperOrigin-RevId: 210442856
* [XLA] Unify spelling of 'fusible'Gravatar Benjamin Kramer2018-08-27
| | | | | | Of {fusable, fusile, fusible} my dictionary only knows about fusible. PiperOrigin-RevId: 210373347
* Reduce the overhead of invoking an OpKernel when tracing is not enabled.Gravatar Derek Murray2018-08-26
| | | | PiperOrigin-RevId: 210317627
* Make compile_options a mandatory const-ref argument.Gravatar Sanjoy Das2018-08-24
| | | | PiperOrigin-RevId: 210130976
* Memory ordering is enforced by streams, we don't need to block TensorFlow's ↵Gravatar A. Unique TensorFlower2018-08-24
| | | | | | executor until transfers from host to device are complete. PiperOrigin-RevId: 210098914
* [tf2xla] Re-organize information about resource ops in one place; NFCGravatar Sanjoy Das2018-08-23
| | | | | | | | | This is a cleanup on cr/208763036. Instead of spreading information about resource ops between jit/mark_for_compilation_pass and jit/resource_operation_safety_analysis we now have tf2xla/resource_operation_table own it. PiperOrigin-RevId: 210044178
* Directly import tensor.proto.h (the transitive import will be removed from ↵Gravatar Eugene Brevdo2018-08-23
| | | | | | | | | | tensor.h soon) We plan to remove the import variant.h from tensor.h; and variant.h brings in a lot of transitive imports (including protos like tensor.proto.h). To prepare, we're updating folks who this will break. PiperOrigin-RevId: 210043667
* Use absl::make_unique in tf2xla.Gravatar Justin Lebar2018-08-23
| | | | PiperOrigin-RevId: 210042392