[XLA:GPU] Mark loads as invariant where appropriate.

If we read a value within an HLO that isn't modified by that HLO, mark it as invariant in LLVM IR. LLVM can perform more aggressive optimizations on invariant loads, but I don't expect this will help much in our case, because XLA already emits pretty aggressive noalias information on loads and stores. However, on nvidia GPUs, marking loads as invariant has the additional benefit of allowing LLVM to lower the load as ld.global.nc (equivalent to CUDA's __ldg). This instruction uses a special cache on the GPU, and it's usually faster than a regular load. PiperOrigin-RevId: 175774979
author: Justin Lebar <jlebar@google.com> 2017-11-14 20:12:00 -0800
committer: TensorFlower Gardener <gardener@tensorflow.org> 2017-11-14 20:16:38 -0800
commit: 24a6162d2d5fad078157e2ec514f2fbb7ee0c676 (patch)
tree: 602b458f606e8e374f9d27862a5aa6ca4b67a381 /tensorflow/compiler/xla/service/buffer_assignment.h
parent: c798e04fbd24809d0bc52d4b80f30e74418b8f4d (diff)
1 files changed, 6 insertions, 0 deletions
diff --git a/tensorflow/compiler/xla/service/buffer_assignment.h b/tensorflow/compiler/xla/service/buffer_assignment.h
index 08a53af8ba..08a40bfeb2 100644
--- a/tensorflow/compiler/xla/service/buffer_assignment.h
+++ b/tensorflow/compiler/xla/service/buffer_assignment.h
@@ -327,6 +327,12 @@ class BufferAssignment {
     return SharesSliceAtIndex(hlo_a, {}, hlo_b, {});
   }
 
+  // Returns true if hlo_a and hlo_b both have at least one buffer assigned for
+  // their top-level and each of their nested shape indices, and if hlo_a's
+  // buffers are all different from hlo_b's buffers.
+  bool HaveDisjointSlices(const HloInstruction* hlo_a,
+                          const HloInstruction* hlo_b) const;
+
   // Returns the underlying points-to analysis used for this assignment.
   const TuplePointsToAnalysis& points_to_analysis() const {
     return liveness_->points_to_analysis();
author	Justin Lebar <jlebar@google.com>	2017-11-14 20:12:00 -0800
committer	TensorFlower Gardener <gardener@tensorflow.org>	2017-11-14 20:16:38 -0800
commit	24a6162d2d5fad078157e2ec514f2fbb7ee0c676 (patch)
tree	602b458f606e8e374f9d27862a5aa6ca4b67a381 /tensorflow/compiler/xla/service/buffer_assignment.h
parent	c798e04fbd24809d0bc52d4b80f30e74418b8f4d (diff)