diff options
author | A. Unique TensorFlower <gardener@tensorflow.org> | 2017-10-11 14:45:28 -0700 |
---|---|---|
committer | TensorFlower Gardener <gardener@tensorflow.org> | 2017-10-11 14:49:20 -0700 |
commit | 10d0ae696c7b5618cae9e3845af8300fe62870a2 (patch) | |
tree | 03b971813653820bce5f366bdebd0aa01f687a73 /tensorflow/compiler/tf2xla/kernels/gather_op.cc | |
parent | f640c8980571d7578e891ea5ceab55978c8db9b4 (diff) |
[XLA:CPU] Adds intra-op parallelism to the "sequential" CPU backend (which already has intra-op parallelism for library calls).
Adds support for parallel task assignment to instructions in entry (or embedded) computations.
Adds code to emit calls to a new a runtime parallel fork/join function for instructions which have been assigned parallel tasks.
Adds a simple cost model for I/O bound instructions.
*) Translation (deleuze model) wall time (seconds).
large_model small_model small_model_small_attn
sequential: 0.00556 0.00484 0.00155
parallel: 0.00263 0.00163 0.00106
*) Wavenet
sequential: Avg. latency (30 runs): 1026.13ms, min/max: 988/1108ms
parallel: Avg. latency (30 runs): 800.633ms, min/max: 785/818ms
*) ParallelFusion benchmark.
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------
sequential cpu backend (at head) 610584 611467 1000
parallel cpu backend 153241 836097 4528
sequential cpu backend (this CL) 113482 679535 6017
PiperOrigin-RevId: 171877766
Diffstat (limited to 'tensorflow/compiler/tf2xla/kernels/gather_op.cc')
0 files changed, 0 insertions, 0 deletions