[XLA:CPU] Adds intra-op parallelism to the "sequential" CPU backend (which already has intra-op parallelism for library calls). - tensorflow

diff options

author	A. Unique TensorFlower <gardener@tensorflow.org>	2017-10-11 14:45:28 -0700
committer	TensorFlower Gardener <gardener@tensorflow.org>	2017-10-11 14:49:20 -0700
commit	10d0ae696c7b5618cae9e3845af8300fe62870a2 (patch)
tree	03b971813653820bce5f366bdebd0aa01f687a73 /tensorflow/compiler/tf2xla/kernels/gather_op.cc
parent	f640c8980571d7578e891ea5ceab55978c8db9b4 (diff)

[XLA:CPU] Adds intra-op parallelism to the "sequential" CPU backend (which already has intra-op parallelism for library calls).

Adds support for parallel task assignment to instructions in entry (or embedded) computations. Adds code to emit calls to a new a runtime parallel fork/join function for instructions which have been assigned parallel tasks. Adds a simple cost model for I/O bound instructions. *) Translation (deleuze model) wall time (seconds). large_model small_model small_model_small_attn sequential: 0.00556 0.00484 0.00155 parallel: 0.00263 0.00163 0.00106 *) Wavenet sequential: Avg. latency (30 runs): 1026.13ms, min/max: 988/1108ms parallel: Avg. latency (30 runs): 800.633ms, min/max: 785/818ms *) ParallelFusion benchmark. Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------------------- sequential cpu backend (at head) 610584 611467 1000 parallel cpu backend 153241 836097 4528 sequential cpu backend (this CL) 113482 679535 6017 PiperOrigin-RevId: 171877766

Diffstat (limited to 'tensorflow/compiler/tf2xla/kernels/gather_op.cc')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: