aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/compiler/xla/service/cpu/simple_orc_jit.cc
Commit message (Collapse)AuthorAge
* Implement sort op for CPU.Gravatar Adrian Kuegel2018-09-19
| | | | | | Also don't allow parallelization for the sort op in parallel_task_assignment. PiperOrigin-RevId: 213592046
* [XLA] Use absl string types and functions instead of the TF versions.Gravatar Justin Lebar2018-08-23
| | | | | | | Unfortunately this has to be one big patch, because e.g. absl::StrCat doesn't accept a TF StringPiece, but as soon as we switch to absl::string_view, we have to switch away from all of the TF functions. PiperOrigin-RevId: 209957896
* [XLA] Use absl::make_unique instead of xla::MakeUnique.Gravatar Justin Lebar2018-08-20
| | | | | | Same for WrapUnique. PiperOrigin-RevId: 209531124
* Delete ExternalConstantPool.Gravatar Adrian Kuegel2018-06-26
| | | | PiperOrigin-RevId: 202090038
* Merge changes from github.Gravatar Akshay Modi2018-06-18
| | | | PiperOrigin-RevId: 201110240
* Automated g4 rollback of changelist 201011811Gravatar Akshay Modi2018-06-18
| | | | PiperOrigin-RevId: 201033171
* Merge changes from github.Gravatar Akshay Modi2018-06-18
| | | | PiperOrigin-RevId: 201011811
* Adapt LLVM ORC interface change in r332541.Gravatar Eric Liu2018-05-17
| | | | PiperOrigin-RevId: 196978634
* Don't call into Eigen unless the input and output tensors are alignedGravatar Sanjoy Das2018-05-09
| | | | | | | | | | We teach TargetMachineFeatures about the alignment required for Eigen GEMM and Conv and then pipe TargetMachineFeatures through the places that need to decide whether a dot or a conv needs to be lowered to a call to Eigen. I also had to fix a minor bug in our LLVM IR implementation for convolution. PiperOrigin-RevId: 196065557
* [TF:XLA] Add INTEL MKL_DNN Conv2d method to XLA/CPU backendGravatar Tony Wang2018-04-26
| | | | | | | The INTEL MKL_DNN provides 32-bit Conv2d method. With INTEL_MKL flag set, XLA backend emits runtime call to MKL_DNN Conv2d instead of Eigen. PiperOrigin-RevId: 194445212
* Automated g4 rollback of changelist 191605505Gravatar Tony Wang2018-04-05
| | | | PiperOrigin-RevId: 191824447
* Automated g4 rollback of changelist 191527251Gravatar Tony Wang2018-04-04
| | | | PiperOrigin-RevId: 191605505
* [TF:XLA] Add INTEL_MKL_ML MatMul method to XLA/CPU backendGravatar Tony Wang2018-04-03
| | | | | | | The INTEL GEMM API provides 32-bit and 64-bit MatMul. With INTEL_MKL flag set, XLA backend emits runtime call to INTEL GEMM MatMul instead of Eigen. PiperOrigin-RevId: 191527251
* Update LLVM API usage to match upstream change.Gravatar A. Unique TensorFlower2018-04-03
| | | | PiperOrigin-RevId: 191428965
* [XLA] FP16 Dot support for the CPU and GPU backends.Gravatar Bixia Zheng2018-02-28
| | | | | | | | | | | | | | | | | | | Extend the stream interface ThenBlasGemmWithAlgorithm to support F16 matrix multiplication with computation type FP32. Extend the stream executor interface DoBlasGemmWithAlgorithm to support F16 GEMM with computation type FP32. Extend the CPU IR emitter to handle F16 Dot instruction, and add F16 matrix multiplication implementation to the CPU runtime. Extend the GPU backend to handle FP16 GEMM Thunk. Replicate the existing matrix multiplication test cases in matrix_ops_simple_test and dot_operation_test for FP16. RELNOTES: PiperOrigin-RevId: 187369731
* [XLA:CPU] Add FP32<->FP16 conversion routinesGravatar Sanjoy Das2018-02-20
| | | | | | | | | | | | LLVM generates calls to these functions when lowering some fp16 operations on certain architectures. These symbols are defined in compiler-rt but we don't always link to compiler-rt so these symbols are sometimes absent. This change adds __gnu_f2h_ieee and __gnu_h2f_ieee as weak symbols. Making them weak ensures that we are able to build successfully even when linking to a compiler-rt that defines these symbols. PiperOrigin-RevId: 186416684
* [XLA:CPU] Minor cleanup to simple_orc_jitGravatar Sanjoy Das2018-02-16
| | | | | | | SimpleResolver became unused after an LLVM upstream merge, and we never needed the name mangling logic in what is now FindCompiledSymbol. PiperOrigin-RevId: 186039307
* Adapt to API changes in LLVM revisions r325155 and r325180.Gravatar Benjamin Kramer2018-02-16
| | | | PiperOrigin-RevId: 185979538
* Enable half precision convolution for the CPU and GPU backends.Gravatar Bixia Zheng2018-02-15
| | | | | | | | | | | Enhance the CPU IR emitter to support F16 dot operation and convolution operation. Add a CPU runtime implementation for F16 convolution. Enhance the GPU backend to handle F16 convolution thunk. Convert some F32 xla convolution tests to support both F32 and F16 and disable the tests for the CPU backend due to b/72509305. PiperOrigin-RevId: 185862438
* [XLA:CPU] Implement vectorized Log in LLVM IRGravatar Sanjoy Das2018-02-12
| | | | | | | This was the last vectorized intrinsic for which we had to call into C++ so also remove the associated machinery. PiperOrigin-RevId: 185482962
* Update for LLVM API change r324700Gravatar A. Unique TensorFlower2018-02-09
| | | | PiperOrigin-RevId: 185149198
* Update for LLVM API change r324405Gravatar A. Unique TensorFlower2018-02-08
| | | | PiperOrigin-RevId: 185016276
* [XLA:CPU] Fix/suppress issues caught by the C++ linterGravatar Sanjoy Das2018-02-07
| | | | PiperOrigin-RevId: 184856538
* [XLA:CPU] Add an LLVM IR implementation of ExpGravatar Sanjoy Das2018-02-06
| | | | | | | This lets us avoid the usual set of issues that crop up when XLA generated code has to call into C++. PiperOrigin-RevId: 184793093
* Update XLA for LLVM r323001Gravatar Benjamin Kramer2018-01-20
| | | | | | This will require an LLVM version bump PiperOrigin-RevId: 182661291
* Merge changes from github.Gravatar Raghuraman Krishnamoorthi2018-01-03
| | | | PiperOrigin-RevId: 180746153
* Automated g4 rollback of changelist 180000981Gravatar A. Unique TensorFlower2018-01-02
| | | | PiperOrigin-RevId: 180581912
* Merge changes from github.Gravatar Patrick Nguyen2017-12-28
| | | | PiperOrigin-RevId: 180301735
* Automated g4 rollback of changelist 179983419Gravatar A. Unique TensorFlower2017-12-23
| | | | PiperOrigin-RevId: 180000981
* Adds FFT for XLA: CPU via Eigen, GPU via cuFFT.Gravatar A. Unique TensorFlower2017-12-22
| | | | | | GPU support includes plan reuse with new scratch allocator per execution in fft_thunk. PiperOrigin-RevId: 179983419
* Merge changes from github.Gravatar A. Unique TensorFlower2017-12-22
| | | | PiperOrigin-RevId: 179953488
* Merge changes from github.Gravatar Sourabh Bajaj2017-11-30
| | | | PiperOrigin-RevId: 177526301
* Expose an Orc JIT memory mapper registry.Gravatar Sanjoy Das2017-11-15
| | | | | | | XLA clients can use this registry to inject client-specific behavior into how Orc JIT's manages virtual memory. PiperOrigin-RevId: 175905401
* Roll forward CL 171084886Gravatar Sanjoy Das2017-10-24
| | | | | | | | | | | | | | | 171084886 had to be rolled back twice due to various open source build issues. I'm trying again, now that I think I've addressed all the pertinent issues. Original CL description: Don't use dlsym to resolve symbols in the CPU JIT Instead of resolving symbols via dlsym when JITting for the CPU backend, use a registry based mechanism. This lets us kill off the --export_dynamic hack that we used to need for CustomCall on the CPU backend. PiperOrigin-RevId: 173277862
* Automated g4 rollback of changelist 171877766Gravatar A. Unique TensorFlower2017-10-16
| | | | PiperOrigin-RevId: 172325692
* Automated g4 rollback of changelist 171877766Gravatar Anna R2017-10-11
| | | | PiperOrigin-RevId: 171915087
* [XLA:CPU] Adds intra-op parallelism to the "sequential" CPU backend (which ↵Gravatar A. Unique TensorFlower2017-10-11
| | | | | | | | | | | | | | | | | | | | | | | | | | already has intra-op parallelism for library calls). Adds support for parallel task assignment to instructions in entry (or embedded) computations. Adds code to emit calls to a new a runtime parallel fork/join function for instructions which have been assigned parallel tasks. Adds a simple cost model for I/O bound instructions. *) Translation (deleuze model) wall time (seconds). large_model small_model small_model_small_attn sequential: 0.00556 0.00484 0.00155 parallel: 0.00263 0.00163 0.00106 *) Wavenet sequential: Avg. latency (30 runs): 1026.13ms, min/max: 988/1108ms parallel: Avg. latency (30 runs): 800.633ms, min/max: 785/818ms *) ParallelFusion benchmark. Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------------------- sequential cpu backend (at head) 610584 611467 1000 parallel cpu backend 153241 836097 4528 sequential cpu backend (this CL) 113482 679535 6017 PiperOrigin-RevId: 171877766
* Use an external constant pool to reduce LLVM compile timesGravatar Sanjoy Das2017-10-09
| | | | | | | | | | | LLVM does not deal well with huge arrays emitted inline into the IR. In JIT mode, this change teaches XLA to emit large constant tensors onto a side data structure, which are then symbolically linked to the generated executable. It is important to note that this works only in JIT mode, and my current understanding is that making this work reliably in AOT will be somewhat more difficult. PiperOrigin-RevId: 171626043
* Automated g4 rollback of changelist 171084886Gravatar Sanjoy Das2017-10-05
| | | | PiperOrigin-RevId: 171221629
* Don't use dlsym to resolve symbols in the CPU JITGravatar Sanjoy Das2017-10-04
| | | | | | | | Instead of resolving symbols via dlsym when JITting for the CPU backend, use a registry based mechanism. This lets us kill off the --export_dynamic hack that we used to need for CustomCall on the CPU backend. PiperOrigin-RevId: 171084886
* Automated g4 rollback of changelist 170892257Gravatar Gunhan Gulsoy2017-10-03
| | | | PiperOrigin-RevId: 170919783
* Don't use dlsym to resolve symbols in the CPU JITGravatar Sanjoy Das2017-10-03
| | | | | | | | Instead of resolving symbols via dlsym when JITting for the CPU backend, use a registry based mechanism. This lets us kill off the --export_dynamic hack that we used to need for CustomCall on the CPU backend. PiperOrigin-RevId: 170892257
* Add debug flag to disable expensive LLVM optimization passes.Gravatar A. Unique TensorFlower2017-08-28
| | | | | RELNOTES: n/a PiperOrigin-RevId: 166766323
* CPU backend: support NEON intrinsics.Gravatar A. Unique TensorFlower2017-08-18
| | | | | | | | | | | | This adds log and exp for NEON. (tanh is already supported on all platforms via the LLVM IR runtime) This change also fixes tf_library() to link in the intrinsics to the binary. PiperOrigin-RevId: 165782270
* In fast-math mode emit a tanh that has a faster min/max.Gravatar A. Unique TensorFlower2017-08-10
| | | | PiperOrigin-RevId: 164943597
* Add a compiler interface to inspect LLVM IR.Gravatar A. Unique TensorFlower2017-08-09
| | | | | | | | | | | | This change introduces an LLVMCompiler class, of which the CPU and GPU compilers are subclasses. The LLVMCompiler class provides the ability to inspect LLVM generated compiler code by registering a callback. The callbacks can be used to analyze IR before and after optimizations. This also adds a simple test for the callback mechanism. PiperOrigin-RevId: 164805348
* Add a configuration option for code size.Gravatar A. Unique TensorFlower2017-08-03
| | | | | | | | | | | | | | | | | | | This change adds a CPU-specific flag: xla_cpu_optimize_for_size When this flag is passed, it changes the optimizers to run more or less analogously to LLVM's -Os optimizations. There are two things that turning on the code size optimization option controls: * the internal settings of some optimization passes (which is mostly controlled through a function attribute) * the passes that get run (which is decided by the pass manager) This change also refactors the code by reorganizing the way that CPU backend specific flags are queried, as well as some other minor refactoring. PiperOrigin-RevId: 164218771
* Merged commit includes the following changes:Gravatar A. Unique TensorFlower2017-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 163914294 by annarev: Refactors build target for gradients_impl to allow code to depend on the gradient generation but not the gradients themselves. -- 163913011 by A. Unique TensorFlower: Use an LLVM-IR version of vector hyperbolic tangent. This lets us: - Inline routine where it is called, eliminated call overhead. - Use AVX instructions in JITed code even if Tensorflow was not built with -mavx. -- 163909534 by A. Unique TensorFlower: Add tensorflow-android to standard TF maven artifacts. -- 163908704 by A. Unique TensorFlower: Go: Update generated wrapper functions for TensorFlow ops. -- 163907709 by A. Unique TensorFlower: Update ops-related pbtxt files. -- 163907497 by A. Unique TensorFlower: Remove old TensorFlow Serving landing page in prepartion for new TF Serving landing page. Fix bad leftnav. -- 163906225 by alive: Refactors build target for gradients_impl to allow code to depend on the gradient generation but not the gradients themselves. -- PiperOrigin-RevId: 163914294
* Update Dataset API documentation.Gravatar Jiri Simsa2017-07-27
| | | | PiperOrigin-RevId: 163349457
* Re-align asterisks for pointer types; NFCGravatar A. Unique TensorFlower2017-07-24
| | | | PiperOrigin-RevId: 163001060