Overhaul XLA:CPU's calling convention.

This CL introduces a clean separation between calls to "thread local" and "global" computations in XLA:CPU. Global computations are: - kWhile body and condition computations - kConditional true and false computations - kCall callees Parameters and results buffers for these calls are assigned a static BufferAllocation::Slice by buffer assignment and so they don't require pointers to result buffers and parameters to be explicitly passed in. In fact, passing in result and parameters buffers is actively misleading because in cases like: while_condition { val = (s32[], pred[]) infeed() ROOT result = get-tuple-element(val), index=0 } there is no instruction explicitly copying the result of the computation into the result buffer. Instead, it is up to the caller to pick up the correct result buffer by asking buffer assignment (which would be buffer where infeed wrote its second tuple component). Thread local computations are all the other nested computations except fusion, e.g. computations used by kMap and kReduce. Parameters and result buffers for these calls are assigned a "thread local" BufferAllocation::Slice which in XLA:CPU are mapped to allocas. Since these are not static addresses, we *do* need to pass in parameter and result buffers. The output is written to the result buffer by "allocating" the storage for the root into the result buffer passed in by the caller. There are two cleanup items that I kept off this CL to make reviews easier: - We should rename "temps" to something more generic, like "buffer_table". I'll do that in a followup CL. - We should use GatherComputationsByAllocationType from buffer_assignment.cc to CHECK that we use thread local calls for thread local callees and global calls for global callees. PiperOrigin-RevId: 206843794
author: Sanjoy Das <sanjoy@google.com> 2018-07-31 16:00:25 -0700
committer: TensorFlower Gardener <gardener@tensorflow.org> 2018-07-31 16:10:50 -0700
commit: fba2d773f45f10882aa475ac75cbf9884995d626 (patch)
tree: 8e9b550b73fee6955b946b3b461137e0f3eee4ff /tensorflow/compiler/aot
parent: 4f656df0c0897f9f50931824298e4bfc7f757707 (diff)
1 files changed, 3 insertions, 1 deletions
diff --git a/tensorflow/compiler/aot/runtime.cc b/tensorflow/compiler/aot/runtime.cc
index 5e74079fc1..475eebaa35 100644
--- a/tensorflow/compiler/aot/runtime.cc
+++ b/tensorflow/compiler/aot/runtime.cc
@@ -85,7 +85,9 @@ void* MallocContiguousBuffers(const intptr_t* sizes, size_t n, void** bufs,
   }
   uintptr_t pos = reinterpret_cast<uintptr_t>(contiguous);
   for (size_t i = 0; i < n; ++i) {
-    if (sizes[i] == -1) {
+    if (sizes[i] < 0) {
+      // bufs[i] is either a constant, an entry parameter or a thread local
+      // allocation.
       bufs[i] = nullptr;
     } else {
       bufs[i] = reinterpret_cast<void*>(pos);
author	Sanjoy Das <sanjoy@google.com>	2018-07-31 16:00:25 -0700
committer	TensorFlower Gardener <gardener@tensorflow.org>	2018-07-31 16:10:50 -0700
commit	fba2d773f45f10882aa475ac75cbf9884995d626 (patch)
tree	8e9b550b73fee6955b946b3b461137e0f3eee4ff /tensorflow/compiler/aot
parent	4f656df0c0897f9f50931824298e4bfc7f757707 (diff)