diff options
author | 2018-08-01 13:17:07 -0700 | |
---|---|---|
committer | 2018-08-01 13:21:38 -0700 | |
commit | 774c34fa2153193ee7af899bb9a4b72d384dea61 (patch) | |
tree | 2abbe641e2472dce089b4d3c92fadcd1d14a951f /tensorflow/compiler/aot | |
parent | c709503f31fa9be07c5ee62ed1eb7fb29c2aaa73 (diff) |
Reland "Overhaul XLA:CPU's calling convention."
aligned_buffer_bytes in compiler/aot/runtime.cc was checking sizes[i] == -1 (as
opposed to checking sizes[i] < 0) to decide whether sizes[i] should count
towards the total size.
Original CL description:
Overhaul XLA:CPU's calling convention.
This CL introduces a clean separation between calls to "thread local" and
"global" computations in XLA:CPU.
Global computations are:
- kWhile body and condition computations
- kConditional true and false computations
- kCall callees
Parameters and results buffers for these calls are assigned a static
BufferAllocation::Slice by buffer assignment and so they don't require pointers
to result buffers and parameters to be explicitly passed in. In fact, passing
in result and parameters buffers is actively misleading because in cases like:
while_condition {
val = (s32[], pred[]) infeed()
ROOT result = get-tuple-element(val), index=0
}
there is no instruction explicitly copying the result of the computation into
the result buffer. Instead, it is up to the caller to pick up the correct
result buffer by asking buffer assignment (which would be buffer where infeed
wrote its second tuple component).
Thread local computations are all the other nested computations except fusion,
e.g. computations used by kMap and kReduce.
Parameters and result buffers for these calls are assigned a "thread local"
BufferAllocation::Slice which in XLA:CPU are mapped to allocas. Since these are
not static addresses, we *do* need to pass in parameter and result buffers. The
output is written to the result buffer by "allocating" the storage for the
root into the result buffer passed in by the caller.
There are two cleanup items that I kept off this CL to make reviews easier:
- We should rename "temps" to something more generic, like "buffer_table".
I'll do that in a followup CL.
- We should use GatherComputationsByAllocationType from buffer_assignment.cc to
CHECK that we use thread local calls for thread local callees and global
calls for global callees.
PiperOrigin-RevId: 206980796
Diffstat (limited to 'tensorflow/compiler/aot')
-rw-r--r-- | tensorflow/compiler/aot/runtime.cc | 6 |
1 files changed, 4 insertions, 2 deletions
diff --git a/tensorflow/compiler/aot/runtime.cc b/tensorflow/compiler/aot/runtime.cc index 5e74079fc1..7606420ded 100644 --- a/tensorflow/compiler/aot/runtime.cc +++ b/tensorflow/compiler/aot/runtime.cc @@ -64,7 +64,7 @@ size_t align_to(size_t n, size_t align) { size_t aligned_buffer_bytes(const intptr_t* sizes, size_t n) { size_t total = 0; for (size_t i = 0; i < n; ++i) { - if (sizes[i] != -1) { + if (sizes[i] > 0) { total += align_to(sizes[i], kAlign); } } @@ -85,7 +85,9 @@ void* MallocContiguousBuffers(const intptr_t* sizes, size_t n, void** bufs, } uintptr_t pos = reinterpret_cast<uintptr_t>(contiguous); for (size_t i = 0; i < n; ++i) { - if (sizes[i] == -1) { + if (sizes[i] < 0) { + // bufs[i] is either a constant, an entry parameter or a thread local + // allocation. bufs[i] = nullptr; } else { bufs[i] = reinterpret_cast<void*>(pos); |