| Commit message (Collapse) | Author | Age |
|
|
|
| |
PiperOrigin-RevId: 211992206
|
|
|
|
| |
PiperOrigin-RevId: 209686671
|
|
|
|
| |
PiperOrigin-RevId: 209640734
|
|
|
|
|
|
| |
HloModuleConfig.
PiperOrigin-RevId: 201284741
|
|
|
|
|
|
|
|
| |
Previously, only one layout was stored with an HLO module. This CL allows
HLO passes to modify the on-device layouts without affecting the on-host
layout (provided by the client)
PiperOrigin-RevId: 195014875
|
|
|
|
| |
PiperOrigin-RevId: 188803724
|
|
|
|
| |
PiperOrigin-RevId: 186777369
|
|
|
|
| |
PiperOrigin-RevId: 175217850
|
|
|
|
|
|
|
|
| |
executables which only generated the array values of tuple-shaped outputs, not the tuple index tables.. With cl/170133015, ShapedBuffers which hold the computation output now have materialized tuples with these index tables so this option is no longer desired or necessary.
No functional change. Just cleanup.
PiperOrigin-RevId: 171035738
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
*) Partitions HLO instructions along outer dimensions, based on simple cost model.
*) Emits loop nests with dynamic outer loop bounds (for partitions), leaves inner loop bounds static (for optimizations).
*) Dispatches parallel tasks on thread pool for execution.
Simple element-wise fusion benchmark:
CPU: Intel Sandybridge with HyperThreading (16 cores) dL1:32KB dL2:256KB dL3:20MB
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------
BM_ParallelFusion/T1 16821490 16740939 100 237.791MB/s
BM_ParallelFusion/T2 9175467 17826232 100 435.945MB/s
BM_ParallelFusion/T4 5106019 18875761 100 783.389MB/s
BM_ParallelFusion/T8 2833598 19624622 233 1.379GB/s
BM_ParallelFusion/T16 1995259 26541594 344 1.958GB/s
Performance on some select model benchmarks (more work needed is needed here, but wanted to get this CL in and iterate).
Benchmark runs with 16 threads and wall time reported in seconds.
InceptionResnetV2.inception_resnet_v2_200x200x20x1000_inference_xla_cpu
wall_time(old): 7.97818803787
wall_time(new): 4.328297019
InceptionV3.inception_v3_200x200x20x1000_inference_xla_cpu
wall_time(old): 2.96792650223
wall_time(new): 1.21296644211
InceptionResnetV2.inception_resnet_v2_200x200x20x1000_training_xla_cpu
wall_time(old): 42.0342495441
wall_time(new): 17.9182584286
InceptionV3.inception_v3_200x200x20x1000_training_xla_cpu
wall_time(old): 6.99778497219
wall_time(new): 3.95318603516
BenchmarkRNN.rnn_basic_lstm_64x512_4x20_xla_cpu_forward
wall_time(old): 11.869822979
wall_time(new): 7.89778208733
BenchmarkRNN.rnn_basic_lstm_64x512_4x20_xla_cpu_forward_backward
wall_time(old): 38.1911079884
wall_time(new): 29.8181960583
PiperOrigin-RevId: 159474444
|
|
|
|
|
|
|
|
|
| |
xla_enable_fast_math
... where it belongs with its other debug-y friends.
While at it also add it as a flag and flip to a positive tone
PiperOrigin-RevId: 158868016
|
|
|
|
|
|
|
| |
Pipes the option through xla.proto ExecutionOptions, to HloModuleConfig, which
can then be accessed throughout the compiler.
PiperOrigin-RevId: 157615458
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the circular dependency when creating a HloModule, then a
HloModuleConfig with the help of the module's entry computation, then assigning
the config back to the module. Now we have to pass a config when creating a
module, or a default config gets created.
This allows removing quite a bit of boilerplate code.
PiperOrigin-RevId: 157059949
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ExecutionOptions.
This simplifies the execution API by getting rid of two default params.
Also change HloModuleConfig so it stores each of the fields of
ExecutionOptions individually, instead of keeping an instance of the
ExecutionOptions proto.
This is necessary because HloModuleConfig already has a field derived
from shape_with_output_layout -- if we stored the ExecutionOptions proto
in HloModuleConfig, its shape wouldn't necessarily match the shape we
already have.
Change: 146477669
|
|
|
|
|
|
| |
We want to put fields in this proto that aren't strictly related to
compilation.
Change: 146477500
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, XLA controlled the presence/absence of fast-math flags (FMF) via a
command-line flag. This patch changes things so we use a new CompileOptions
proto instead.
This proto lives in HloModuleConfig, and is passed to the service via
ExecuteRequest.
This change lets us entirely remove llvm_backend_flags.{h,cc}.
In addition, this change takes us from two to one fast-math flags. Previously
we tried to control "unsafe FP transformations" separately from "full fast
math". It turns out that LLVM is misleadingly inconsistent in how it handles
these. In the backend, they are indeed two separate options that can be
enabled/disabled independently. In the frontend, however, unsafe-fp-math
implies all the other FMFs.
As a result, it doesn't really make sense for XLA to attempt to split out these
two flags, at least not until LLVM changes how it handles them.
Change: 146183994
|
|
XLA is a compiler-based linear algebra execution engine that targets CPUs, GPUs and custom accelerators.
XLA is still experimental; we are releasing it early to get the community involved.
Change: 143990941
|