aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/compiler/aot
diff options
context:
space:
mode:
authorGravatar Sanjoy Das <sanjoy@google.com>2018-08-01 13:17:07 -0700
committerGravatar TensorFlower Gardener <gardener@tensorflow.org>2018-08-01 13:21:38 -0700
commit774c34fa2153193ee7af899bb9a4b72d384dea61 (patch)
tree2abbe641e2472dce089b4d3c92fadcd1d14a951f /tensorflow/compiler/aot
parentc709503f31fa9be07c5ee62ed1eb7fb29c2aaa73 (diff)
Reland "Overhaul XLA:CPU's calling convention."
aligned_buffer_bytes in compiler/aot/runtime.cc was checking sizes[i] == -1 (as opposed to checking sizes[i] < 0) to decide whether sizes[i] should count towards the total size. Original CL description: Overhaul XLA:CPU's calling convention. This CL introduces a clean separation between calls to "thread local" and "global" computations in XLA:CPU. Global computations are: - kWhile body and condition computations - kConditional true and false computations - kCall callees Parameters and results buffers for these calls are assigned a static BufferAllocation::Slice by buffer assignment and so they don't require pointers to result buffers and parameters to be explicitly passed in. In fact, passing in result and parameters buffers is actively misleading because in cases like: while_condition { val = (s32[], pred[]) infeed() ROOT result = get-tuple-element(val), index=0 } there is no instruction explicitly copying the result of the computation into the result buffer. Instead, it is up to the caller to pick up the correct result buffer by asking buffer assignment (which would be buffer where infeed wrote its second tuple component). Thread local computations are all the other nested computations except fusion, e.g. computations used by kMap and kReduce. Parameters and result buffers for these calls are assigned a "thread local" BufferAllocation::Slice which in XLA:CPU are mapped to allocas. Since these are not static addresses, we *do* need to pass in parameter and result buffers. The output is written to the result buffer by "allocating" the storage for the root into the result buffer passed in by the caller. There are two cleanup items that I kept off this CL to make reviews easier: - We should rename "temps" to something more generic, like "buffer_table". I'll do that in a followup CL. - We should use GatherComputationsByAllocationType from buffer_assignment.cc to CHECK that we use thread local calls for thread local callees and global calls for global callees. PiperOrigin-RevId: 206980796
Diffstat (limited to 'tensorflow/compiler/aot')
-rw-r--r--tensorflow/compiler/aot/runtime.cc6
1 files changed, 4 insertions, 2 deletions
diff --git a/tensorflow/compiler/aot/runtime.cc b/tensorflow/compiler/aot/runtime.cc
index 5e74079fc1..7606420ded 100644
--- a/tensorflow/compiler/aot/runtime.cc
+++ b/tensorflow/compiler/aot/runtime.cc
@@ -64,7 +64,7 @@ size_t align_to(size_t n, size_t align) {
size_t aligned_buffer_bytes(const intptr_t* sizes, size_t n) {
size_t total = 0;
for (size_t i = 0; i < n; ++i) {
- if (sizes[i] != -1) {
+ if (sizes[i] > 0) {
total += align_to(sizes[i], kAlign);
}
}
@@ -85,7 +85,9 @@ void* MallocContiguousBuffers(const intptr_t* sizes, size_t n, void** bufs,
}
uintptr_t pos = reinterpret_cast<uintptr_t>(contiguous);
for (size_t i = 0; i < n; ++i) {
- if (sizes[i] == -1) {
+ if (sizes[i] < 0) {
+ // bufs[i] is either a constant, an entry parameter or a thread local
+ // allocation.
bufs[i] = nullptr;
} else {
bufs[i] = reinterpret_cast<void*>(pos);