aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/docs_src/performance/xla/developing_new_backend.md
diff options
context:
space:
mode:
Diffstat (limited to 'tensorflow/docs_src/performance/xla/developing_new_backend.md')
-rw-r--r--tensorflow/docs_src/performance/xla/developing_new_backend.md77
1 files changed, 0 insertions, 77 deletions
diff --git a/tensorflow/docs_src/performance/xla/developing_new_backend.md b/tensorflow/docs_src/performance/xla/developing_new_backend.md
deleted file mode 100644
index 840f6983c2..0000000000
--- a/tensorflow/docs_src/performance/xla/developing_new_backend.md
+++ /dev/null
@@ -1,77 +0,0 @@
-# Developing a new backend for XLA
-
-This preliminary guide is for early adopters that want to easily retarget
-TensorFlow to their hardware in an efficient manner. The guide is not
-step-by-step and assumes knowledge of [LLVM](http://llvm.org),
-[Bazel](https://bazel.build/), and TensorFlow.
-
-XLA provides an abstract interface that a new architecture or accelerator can
-implement to create a backend to run TensorFlow graphs. Retargeting XLA should
-be significantly simpler and scalable than implementing every existing
-TensorFlow Op for new hardware.
-
-Most implementations will fall into one of the following scenarios:
-
-1. Existing CPU architecture not yet officially supported by XLA, with or
- without an existing [LLVM](http://llvm.org) backend.
-2. Non-CPU-like hardware with an existing LLVM backend.
-3. Non-CPU-like hardware without an existing LLVM backend.
-
-> Note: An LLVM backend can mean either one of the officially released LLVM
-> backends or a custom LLVM backend developed in-house.
-
-## Scenario 1: Existing CPU architecture not yet officially supported by XLA
-
-In this scenario, start by looking at the existing [XLA CPU backend]
-(https://www.tensorflow.org/code/tensorflow/compiler/xla/service/cpu/).
-XLA makes it easy to retarget TensorFlow to different CPUs by using LLVM, since
-the main difference between XLA backends for CPUs is the code generated by LLVM.
-Google tests XLA for x64 and ARM64 architectures.
-
-If the hardware vendor has an LLVM backend for their hardware, it is simple to
-link the backend with the LLVM built with XLA. In JIT mode, the XLA CPU backend
-emits code for the host CPU. For ahead-of-time compilation,
-[`xla::AotCompilationOptions`](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/compiler.h)
-can provide an LLVM triple to configure the target architecture.
-
-If there is no existing LLVM backend but another kind of code generator exists,
-it should be possible to reuse most of the existing CPU backend.
-
-## Scenario 2: Non-CPU-like hardware with an existing LLVM backend
-
-It is possible to model a new
-[`xla::Compiler`](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/compiler.h)
-implementation on the existing [`xla::CPUCompiler`]
-(https://www.tensorflow.org/code/tensorflow/compiler/xla/service/cpu/cpu_compiler.cc)
-and [`xla::GPUCompiler`]
-(https://www.tensorflow.org/code/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc)
-classes, since these already emit LLVM IR. Depending on the nature of the
-hardware, it is possible that many of the LLVM IR generation aspects will have
-to be changed, but a lot of code can be shared with the existing backends.
-
-A good example to follow is the [GPU backend]
-(https://www.tensorflow.org/code/tensorflow/compiler/xla/service/gpu/)
-of XLA. The GPU backend targets a non-CPU-like ISA, and therefore some aspects
-of its code generation are unique to the GPU domain. Other kinds of hardware,
-e.g. DSPs like Hexagon (which has an upstream LLVM backend), can reuse parts of
-the LLVM IR emission logic, but other parts will be unique.
-
-## Scenario 3: Non-CPU-like hardware without an existing LLVM backend
-
-If it is not possible to utilize LLVM, then the best option is to implement a
-new backend for XLA for the desired hardware. This option requires the most
-effort. The classes that need to be implemented are as follows:
-
-* [`StreamExecutor`](https://www.tensorflow.org/code/tensorflow/stream_executor/stream_executor.h):
- For many devices not all methods of `StreamExecutor` are needed. See
- existing `StreamExecutor` implementations for details.
-* [`xla::Compiler`](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/compiler.h):
- This class encapsulates the compilation of an HLO computation into an
- `xla::Executable`.
-* [`xla::Executable`](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/executable.h):
- This class is used to launch a compiled computation on the platform.
-* [`xla::TransferManager`](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/transfer_manager.h):
- This class enables backends to provide platform-specific mechanisms for
- constructing XLA literal data from given device memory handles. In other
- words, it helps encapsulate the transfer of data from the host to the device
- and back.