diff options
author | Sanjoy Das <sanjoy@google.com> | 2018-07-31 16:00:25 -0700 |
---|---|---|
committer | TensorFlower Gardener <gardener@tensorflow.org> | 2018-07-31 16:10:50 -0700 |
commit | fba2d773f45f10882aa475ac75cbf9884995d626 (patch) | |
tree | 8e9b550b73fee6955b946b3b461137e0f3eee4ff /tensorflow/compiler/aot | |
parent | 4f656df0c0897f9f50931824298e4bfc7f757707 (diff) |
Overhaul XLA:CPU's calling convention.
This CL introduces a clean separation between calls to "thread local" and
"global" computations in XLA:CPU.
Global computations are:
- kWhile body and condition computations
- kConditional true and false computations
- kCall callees
Parameters and results buffers for these calls are assigned a static
BufferAllocation::Slice by buffer assignment and so they don't require pointers
to result buffers and parameters to be explicitly passed in. In fact, passing
in result and parameters buffers is actively misleading because in cases like:
while_condition {
val = (s32[], pred[]) infeed()
ROOT result = get-tuple-element(val), index=0
}
there is no instruction explicitly copying the result of the computation into
the result buffer. Instead, it is up to the caller to pick up the correct
result buffer by asking buffer assignment (which would be buffer where infeed
wrote its second tuple component).
Thread local computations are all the other nested computations except fusion,
e.g. computations used by kMap and kReduce.
Parameters and result buffers for these calls are assigned a "thread local"
BufferAllocation::Slice which in XLA:CPU are mapped to allocas. Since these are
not static addresses, we *do* need to pass in parameter and result buffers. The
output is written to the result buffer by "allocating" the storage for the
root into the result buffer passed in by the caller.
There are two cleanup items that I kept off this CL to make reviews easier:
- We should rename "temps" to something more generic, like "buffer_table".
I'll do that in a followup CL.
- We should use GatherComputationsByAllocationType from buffer_assignment.cc to
CHECK that we use thread local calls for thread local callees and global
calls for global callees.
PiperOrigin-RevId: 206843794
Diffstat (limited to 'tensorflow/compiler/aot')
-rw-r--r-- | tensorflow/compiler/aot/runtime.cc | 4 |
1 files changed, 3 insertions, 1 deletions
diff --git a/tensorflow/compiler/aot/runtime.cc b/tensorflow/compiler/aot/runtime.cc index 5e74079fc1..475eebaa35 100644 --- a/tensorflow/compiler/aot/runtime.cc +++ b/tensorflow/compiler/aot/runtime.cc @@ -85,7 +85,9 @@ void* MallocContiguousBuffers(const intptr_t* sizes, size_t n, void** bufs, } uintptr_t pos = reinterpret_cast<uintptr_t>(contiguous); for (size_t i = 0; i < n; ++i) { - if (sizes[i] == -1) { + if (sizes[i] < 0) { + // bufs[i] is either a constant, an entry parameter or a thread local + // allocation. bufs[i] = nullptr; } else { bufs[i] = reinterpret_cast<void*>(pos); |