aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/compiler/xla/service/local_service.cc
Commit message (Collapse)AuthorAge
* [XLA] Rename all (Mutable)ArraySlice to absl::Span.Gravatar Tim Shen2018-08-30
| | | | PiperOrigin-RevId: 210998142
* [XLA] Switch to absl::StrFormat.Gravatar Justin Lebar2018-08-27
| | | | | | | | Unlike Printf, StrFormat does not require type-length qualifiers, e.g %z, %ll. Nor does it require that you call c_str() to print strings. So these are fixed up here as well. PiperOrigin-RevId: 210435915
* [XLA] Use absl string types and functions instead of the TF versions.Gravatar Justin Lebar2018-08-23
| | | | | | | Unfortunately this has to be one big patch, because e.g. absl::StrCat doesn't accept a TF StringPiece, but as soon as we switch to absl::string_view, we have to switch away from all of the TF functions. PiperOrigin-RevId: 209957896
* [XLA] gtl::optional->absl::optionalGravatar Yunxing Dai2018-08-21
| | | | PiperOrigin-RevId: 209686671
* [XLA] Use absl::make_unique instead of xla::MakeUnique.Gravatar Justin Lebar2018-08-20
| | | | | | Same for WrapUnique. PiperOrigin-RevId: 209531124
* Move xla_computation.* from xla/client/xla_client up to xla/client.Gravatar Mark Heffernan2018-07-25
| | | | | | | | | Plan is to move everything in xla/client/xla_client up to xla/client and remove the directory. No functional change. PiperOrigin-RevId: 206055680
* Remove the ambiguity of device/host computation layouts within the ↵Gravatar A. Unique TensorFlower2018-06-19
| | | | | | HloModuleConfig. PiperOrigin-RevId: 201284741
* Enable the natural layouts of the entry computation to flow into the ↵Gravatar A. Unique TensorFlower2018-06-18
| | | | | | | | | | parameters and result layouts of the entry ComputationLayout. If the arguments shapes passed in to the servie.cc API do not have a layout, it is assumed the caller is willing to accept the natural layout propagated by the XLA compiler. Similarly, if the ExecutionOptions has a shape for the result, but no layout is set in such shape, it is assumed the caller is willing to accept the natural layout propagated by the XLA compiler. Same thing for the ExecutableBuildOptions result_layout(). PiperOrigin-RevId: 201070858
* [XLA] Redesign: delete versioned_computation_handle and compilation_cache.Gravatar A. Unique TensorFlower2018-06-07
| | | | PiperOrigin-RevId: 199673573
* [XLA] Add an unoptimized HLO output flag to ExecutableBuildOptions and to ↵Gravatar Roy Frostig2018-06-01
| | | | | | the XLA local Python client. PiperOrigin-RevId: 198930874
* [XLA] Redesign: delete computation_tracker and user_computation.Gravatar A. Unique TensorFlower2018-05-31
| | | | PiperOrigin-RevId: 198743117
* [XLA] Redesign: delete the old service interface.Gravatar A. Unique TensorFlower2018-05-30
| | | | | | | | | | | | | | | | | | - Computation - ComputeConstant - Execute - ExecuteAsync - ExecuteParallel - GetComputationStats - GetComputationShape - GetLocalShape - IsConstant - LoadComputationSnapshot - Op - SetReturnValue - SnapshotComputation PiperOrigin-RevId: 198669035
* Expose xla_disable_hlo_passes via ExecutableBuildOptions.Gravatar Sanjoy Das2018-05-30
| | | | PiperOrigin-RevId: 198654099
* [XLA] Switch replay_computation to use LocalClient.Gravatar Justin Lebar2018-05-30
| | | | | | | | | | | | | | | This lets replay_computation build an executable once and run it multiple times. This is particularly important because in XLA:GPU, the first run of an executable does some autotuning and therefore is unrepresentative. This change removes --xla_hlo_profile_last_run, because I don't see how to support it in LocalClient -- LocalClient wants the do-profile bit to be set when we *compile*. (There may not be an easy fix for this; it worked with regular Client because we were recompiling every time we ran.) PiperOrigin-RevId: 198643577
* [XLA] Convert XLA to use xla::se as a namespace alias for ::stream_executor.Gravatar Justin Lebar2018-04-17
| | | | PiperOrigin-RevId: 193301997
* [XLA] Redesign: implement local client and local service interface.Gravatar A. Unique TensorFlower2018-03-25
| | | | PiperOrigin-RevId: 190291400
* [XLA] Implement the whole graph execution interface and make a test use ↵Gravatar A. Unique TensorFlower2018-03-25
| | | | | | | | | | XlaBuilder. - Add Client::ExecuteGraph. - Make client_library_test_base also (partially) support XlaBuilder by using template. - Make one testcase in the axpy_simple_test use XlaBuilder. The test was slightly changed because currently the builder does not expend implicit broadcast automatically. PiperOrigin-RevId: 190268658
* [XLA] Only overwrite the hlo_profiling flag when it's not enabled by default.Gravatar Benjamin Kramer2018-03-22
| | | | | | This got broken in 504d103a405654f029e8902d97d4dd8f3aa07513 PiperOrigin-RevId: 190077360
* [XLA] Plumb hlo dump options via local client.Gravatar Chris Leary2018-03-20
| | | | PiperOrigin-RevId: 189851211
* [XLA:python] Plumb hlo_profile flag.Gravatar Chris Leary2018-03-16
| | | | PiperOrigin-RevId: 189377860
* [XLA] Plumb build options via local API.Gravatar Chris Leary2018-01-30
| | | | | | | | | | | | * Break build options into their own translation unit for use from local client and to mirror ExecutableRunOptions. * Add some ToString()s to aid debugging. * Add HLO graph generation regex to build options. * Add SWIG type map for ExecutableBuildOptions. Also fix a build issue occurring on some platforms with triangular_solve. PiperOrigin-RevId: 183837856
* [XLA] Add a DeviceAllocator* argument to compilation.Gravatar Justin Lebar2018-01-26
| | | | | | | | In a later change, the GPU backend will use this allocator to reserve scratch memory when trying out different convolution algorithms during compilation. PiperOrigin-RevId: 183469579
* [XLA] Add source mapping utility translation unit, use it in the local client.Gravatar Chris Leary2018-01-25
| | | | PiperOrigin-RevId: 183331075
* [XLA] Add source mapping support to SWIG API.Gravatar Chris Leary2018-01-17
| | | | PiperOrigin-RevId: 182292142
* [XLA] Expose replicas via local client API.Gravatar Chris Leary2018-01-04
| | | | PiperOrigin-RevId: 180868190
* Merged commit includes the following changes:Gravatar A. Unique TensorFlower2017-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | 179277894 by gunan: Run buildifier on build file. -- 179275101 by meheff: Replace DeviceMemoryBase with ShapedBuffer in XLA interfaces. Executable, TransferManager, and AllocationTracker now use ShapedBuffer to hold device memory addresses holding XLA data. Most of the change is straight-forward with the exception of AllocationTracker which was mostly rewritten (and simplified) and some refactoring in the CPU executable. Also, have ShapedBuffer hold on-host and on-device Shapes which are the shapes of the representation of the data on the host and device, respectively. This is necessary because with cl/178624364 the on-host and on-device shape may no longer be equal. -- 179265385 by A. Unique TensorFlower: Return error rather than CHECK fail in Executable::ExecuteOnStreamWrapper -- 179264551 by dandelion: Internal fixes. -- PiperOrigin-RevId: 179277894
* [TF:XLA] Clean up unused XLA options and functions.Gravatar A. Unique TensorFlower2017-11-10
| | | | PiperOrigin-RevId: 175217850
* Remove "hybrid" HloModuleConfig option. The option was used to generate ↵Gravatar Mark Heffernan2017-10-04
| | | | | | | | executables which only generated the array values of tuple-shaped outputs, not the tuple index tables.. With cl/170133015, ShapedBuffers which hold the computation output now have materialized tuples with these index tables so this option is no longer desired or necessary. No functional change. Just cleanup. PiperOrigin-RevId: 171035738
* Add methods to convert between Literals and ShapedBuffers to LocalClient. ↵Gravatar Mark Heffernan2017-09-21
| | | | | | | | | | | | | These "conversion" methods copy the data to/from the device into/from literals. Also fix various issues I noticed along the way: * Move LocalClient tests into open source. * Add proper == operator to Literals * Add << overload for streaming Literals to output. * Add Literal::GetSubliteral methods. * Remove unused AllocatBufferOnDevice methods from LocalClient and LocalService. PiperOrigin-RevId: 169606342
* [TF:XLA] Use HloEvaluator for ComputeConstant, remove the need of a dedicatedGravatar Kay Zhu2017-08-10
| | | | | | compute constant backend. PiperOrigin-RevId: 164940970
* [XLA] Refactor CreateModuleConfig to share code between multiple call-sites.Gravatar Eli Bendersky2017-07-31
| | | | | | | | Previously Service, LocalService and CompileOnlyService had their own code to create a new HloModuleConfig, with much repetition (and some ommissions); collect all these uses in a single method. PiperOrigin-RevId: 163766869
* [XLA] Get rid of ServiceFlags by absorbing it into DebugOptions.Gravatar A. Unique TensorFlower2017-07-18
| | | | | | | After this change HloModuleConfig::hlo_profiling_enabled_ is redundant. I'll remove it in a future change. PiperOrigin-RevId: 162436163
* [XLA] Remove dead "in-client" code.Gravatar Mark Heffernan2017-06-21
| | | | | | | | Remove Service::runs_in_client_process_ field and it's dead user. This was previously used by the "InProcess" methods which have been replaced with the LocalClient API. PiperOrigin-RevId: 159759455
* [XLA] Remove unused factory in local_serviceGravatar Eli Bendersky2017-06-21
| | | | PiperOrigin-RevId: 159712806
* [XLA] Move replica_count out of Backend.Gravatar Eli Bendersky2017-06-20
| | | | | | | | | This is an intermediate step in making replica_count more explicitly programmable (rather than set by a flag). There is no reason for the Backend to hold replica_count - it was only holding it as a container with no additional semantics. PiperOrigin-RevId: 159626528
* [XLA:CPU] Thread-parallel CPU backend (work in progress).Gravatar A. Unique TensorFlower2017-06-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | *) Partitions HLO instructions along outer dimensions, based on simple cost model. *) Emits loop nests with dynamic outer loop bounds (for partitions), leaves inner loop bounds static (for optimizations). *) Dispatches parallel tasks on thread pool for execution. Simple element-wise fusion benchmark: CPU: Intel Sandybridge with HyperThreading (16 cores) dL1:32KB dL2:256KB dL3:20MB Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------------------- BM_ParallelFusion/T1 16821490 16740939 100 237.791MB/s BM_ParallelFusion/T2 9175467 17826232 100 435.945MB/s BM_ParallelFusion/T4 5106019 18875761 100 783.389MB/s BM_ParallelFusion/T8 2833598 19624622 233 1.379GB/s BM_ParallelFusion/T16 1995259 26541594 344 1.958GB/s Performance on some select model benchmarks (more work needed is needed here, but wanted to get this CL in and iterate). Benchmark runs with 16 threads and wall time reported in seconds. InceptionResnetV2.inception_resnet_v2_200x200x20x1000_inference_xla_cpu wall_time(old): 7.97818803787 wall_time(new): 4.328297019 InceptionV3.inception_v3_200x200x20x1000_inference_xla_cpu wall_time(old): 2.96792650223 wall_time(new): 1.21296644211 InceptionResnetV2.inception_resnet_v2_200x200x20x1000_training_xla_cpu wall_time(old): 42.0342495441 wall_time(new): 17.9182584286 InceptionV3.inception_v3_200x200x20x1000_training_xla_cpu wall_time(old): 6.99778497219 wall_time(new): 3.95318603516 BenchmarkRNN.rnn_basic_lstm_64x512_4x20_xla_cpu_forward wall_time(old): 11.869822979 wall_time(new): 7.89778208733 BenchmarkRNN.rnn_basic_lstm_64x512_4x20_xla_cpu_forward_backward wall_time(old): 38.1911079884 wall_time(new): 29.8181960583 PiperOrigin-RevId: 159474444
* [XLA] Remove gpu/gpu_backend-specific flagsGravatar Eli Bendersky2017-06-16
| | | | | | | | Move useful flags into debug_options, and leave some less used flags out - they can be propagated through debug_options if required (for now there's too much duplication between them and what's already inside) PiperOrigin-RevId: 159261661
* Add ComputationPlacer to assign device ids for replicated model-parallel ↵Gravatar HyoukJoong Lee2017-06-14
| | | | | | computations. PiperOrigin-RevId: 159056198
* [XLA] Simplify Shape traversal visitors.Gravatar Mark Heffernan2017-06-06
| | | | | | Simplify shape traversal visitors in ShapeUtil and ShapeTree. Add a non-Status form because most uses of the traversal methods do not use it, and remove is_leaf parameter from ShapeTree.ForEach* as it is not frequently used. PiperOrigin-RevId: 158201574
* [XLA:CPU] Prep work for thread-parallel XLA CPU backend.Gravatar A. Unique TensorFlower2017-05-12
| | | | | | | | | *) Plumbs intra op thread parallelism value through to XLA backend. *) Service execution uses inter/intra op pools from backend. *) LocalService execution uses intra op pool from backend for XLA parallel ops, and intra op pool passed in ExecutableRunOptions for eigen ops. PiperOrigin-RevId: 155891730
* [XLA] Improve the documentation of our service/client classes a bitGravatar Eli Bendersky2017-05-11
| | | | | | Also kill dead code PiperOrigin-RevId: 155814264
* Refactor XLA's CompileAheadOfTime out of LocalClient into a new ↵Gravatar A. Unique TensorFlower2017-05-05
| | | | | | | CompileOnlyClient class, and likewise from LocalService into a new CompileOnlyService class. This also renames AheadOfTimeComputationInstance to AotComputationInstance for consistency with AotCompilationResult and AotCompilationOptions in compiler/xla/service/compiler.h. Change: 155252320
* Set hlo_profile on module_config if xla_hlo_profile flag is given.Gravatar Jacques Pienaar2017-03-21
| | | | Change: 150817673
* [XLA] Add support for dumping computations during CompileAheadOfTime. Remove ↵Gravatar Peter Hawkins2017-03-15
| | | | | | '/' and '\' characters from path names of dumped graphs. Change: 150231912
* [XLA] Remove LocalClient::ExecuteLocally(), in lieu of ↵Gravatar Peter Hawkins2017-03-07
| | | | | | LocalClient::Compile() and LocalExecutable::Run(). Change: 149482633
* [TF:XLA] Remove support for client-allocated result buffers.Gravatar Peter Hawkins2017-03-07
| | | | | This code path is unused; Tensorflow ended up settling on having XLA allocate result buffers using Tensorflow's allocator. Remove it to reduce the proliferation of ExecuteXYZ() methods. Change: 149423775
* [XLA:GPU] Cache GPU substreams across executionsGravatar A. Unique TensorFlower2017-03-02
| | | | Change: 149063035
* [XLA] Properly version outfeed and send operations in UserComputation.Gravatar Mark Heffernan2017-02-21
| | | | | | | | | | Previously outfeed and send operations were unconditionally emitted during UserComputation lowering even if the outfeed/send was not in the requested version (computation snapshot). This CL versions these operations. Also, opportunistically improve logging in UserComputation, Service, and ComputationTracker which was used to root cause the underlying bug. Change: 148170893
* [XLA] Use `Pool<se::Stream>` as stream cache in backend, and use smart ↵Gravatar A. Unique TensorFlower2017-02-15
| | | | | | pointers rather than explicitly release acquired streams Change: 147620836
* Set the enable_hlo_profiling flag in HloModuleConfig.Gravatar HyoukJoong Lee2017-02-08
| | | | Change: 146981790