tensorflow - machine learning framework

	Commit message (Collapse)	Author	Age
*	[XLA] Rename all (Mutable)ArraySlice to absl::Span.	Tim Shen	2018-08-30
\| \| \| \|	PiperOrigin-RevId: 210998142
*	[XLA] Switch to absl::StrFormat.	Justin Lebar	2018-08-27
\| \| \| \| \| \| \| \|	Unlike Printf, StrFormat does not require type-length qualifiers, e.g %z, %ll. Nor does it require that you call c_str() to print strings. So these are fixed up here as well. PiperOrigin-RevId: 210435915
*	[XLA] Use absl string types and functions instead of the TF versions.	Justin Lebar	2018-08-23
\| \| \| \| \| \| \|	Unfortunately this has to be one big patch, because e.g. absl::StrCat doesn't accept a TF StringPiece, but as soon as we switch to absl::string_view, we have to switch away from all of the TF functions. PiperOrigin-RevId: 209957896
*	[XLA] gtl::optional->absl::optional	Yunxing Dai	2018-08-21
\| \| \| \|	PiperOrigin-RevId: 209686671
*	[XLA] Use absl::make_unique instead of xla::MakeUnique.	Justin Lebar	2018-08-20
\| \| \| \| \| \|	Same for WrapUnique. PiperOrigin-RevId: 209531124
*	Move xla_computation.* from xla/client/xla_client up to xla/client.	Mark Heffernan	2018-07-25
\| \| \| \| \| \| \| \| \|	Plan is to move everything in xla/client/xla_client up to xla/client and remove the directory. No functional change. PiperOrigin-RevId: 206055680
*	Remove the ambiguity of device/host computation layouts within the ↵	A. Unique TensorFlower	2018-06-19
\| \| \| \| \| \|	HloModuleConfig. PiperOrigin-RevId: 201284741
*	Enable the natural layouts of the entry computation to flow into the ↵	A. Unique TensorFlower	2018-06-18
\| \| \| \| \| \| \| \| \| \|	parameters and result layouts of the entry ComputationLayout. If the arguments shapes passed in to the servie.cc API do not have a layout, it is assumed the caller is willing to accept the natural layout propagated by the XLA compiler. Similarly, if the ExecutionOptions has a shape for the result, but no layout is set in such shape, it is assumed the caller is willing to accept the natural layout propagated by the XLA compiler. Same thing for the ExecutableBuildOptions result_layout(). PiperOrigin-RevId: 201070858
*	[XLA] Redesign: delete versioned_computation_handle and compilation_cache.	A. Unique TensorFlower	2018-06-07
\| \| \| \|	PiperOrigin-RevId: 199673573
*	[XLA] Add an unoptimized HLO output flag to ExecutableBuildOptions and to ↵	Roy Frostig	2018-06-01
\| \| \| \| \| \|	the XLA local Python client. PiperOrigin-RevId: 198930874
*	[XLA] Redesign: delete computation_tracker and user_computation.	A. Unique TensorFlower	2018-05-31
\| \| \| \|	PiperOrigin-RevId: 198743117
*	[XLA] Redesign: delete the old service interface.	A. Unique TensorFlower	2018-05-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Computation - ComputeConstant - Execute - ExecuteAsync - ExecuteParallel - GetComputationStats - GetComputationShape - GetLocalShape - IsConstant - LoadComputationSnapshot - Op - SetReturnValue - SnapshotComputation PiperOrigin-RevId: 198669035
*	Expose xla_disable_hlo_passes via ExecutableBuildOptions.	Sanjoy Das	2018-05-30
\| \| \| \|	PiperOrigin-RevId: 198654099
*	[XLA] Switch replay_computation to use LocalClient.	Justin Lebar	2018-05-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This lets replay_computation build an executable once and run it multiple times. This is particularly important because in XLA:GPU, the first run of an executable does some autotuning and therefore is unrepresentative. This change removes --xla_hlo_profile_last_run, because I don't see how to support it in LocalClient -- LocalClient wants the do-profile bit to be set when we compile. (There may not be an easy fix for this; it worked with regular Client because we were recompiling every time we ran.) PiperOrigin-RevId: 198643577
*	[XLA] Convert XLA to use xla::se as a namespace alias for ::stream_executor.	Justin Lebar	2018-04-17
\| \| \| \|	PiperOrigin-RevId: 193301997
*	[XLA] Redesign: implement local client and local service interface.	A. Unique TensorFlower	2018-03-25
\| \| \| \|	PiperOrigin-RevId: 190291400
*	[XLA] Implement the whole graph execution interface and make a test use ↵	A. Unique TensorFlower	2018-03-25
\| \| \| \| \| \| \| \| \| \|	XlaBuilder. - Add Client::ExecuteGraph. - Make client_library_test_base also (partially) support XlaBuilder by using template. - Make one testcase in the axpy_simple_test use XlaBuilder. The test was slightly changed because currently the builder does not expend implicit broadcast automatically. PiperOrigin-RevId: 190268658
*	[XLA] Only overwrite the hlo_profiling flag when it's not enabled by default.	Benjamin Kramer	2018-03-22
\| \| \| \| \| \|	This got broken in 504d103a405654f029e8902d97d4dd8f3aa07513 PiperOrigin-RevId: 190077360
*	[XLA] Plumb hlo dump options via local client.	Chris Leary	2018-03-20
\| \| \| \|	PiperOrigin-RevId: 189851211
*	[XLA:python] Plumb hlo_profile flag.	Chris Leary	2018-03-16
\| \| \| \|	PiperOrigin-RevId: 189377860
*	[XLA] Plumb build options via local API.	Chris Leary	2018-01-30
\| \| \| \| \| \| \| \| \| \| \| \|	* Break build options into their own translation unit for use from local client and to mirror ExecutableRunOptions. * Add some ToString()s to aid debugging. * Add HLO graph generation regex to build options. * Add SWIG type map for ExecutableBuildOptions. Also fix a build issue occurring on some platforms with triangular_solve. PiperOrigin-RevId: 183837856
*	[XLA] Add a DeviceAllocator* argument to compilation.	Justin Lebar	2018-01-26
\| \| \| \| \| \| \| \|	In a later change, the GPU backend will use this allocator to reserve scratch memory when trying out different convolution algorithms during compilation. PiperOrigin-RevId: 183469579
*	[XLA] Add source mapping utility translation unit, use it in the local client.	Chris Leary	2018-01-25
\| \| \| \|	PiperOrigin-RevId: 183331075
*	[XLA] Add source mapping support to SWIG API.	Chris Leary	2018-01-17
\| \| \| \|	PiperOrigin-RevId: 182292142
*	[XLA] Expose replicas via local client API.	Chris Leary	2018-01-04
\| \| \| \|	PiperOrigin-RevId: 180868190
*	Merged commit includes the following changes:	A. Unique TensorFlower	2017-12-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	179277894 by gunan: Run buildifier on build file. -- 179275101 by meheff: Replace DeviceMemoryBase with ShapedBuffer in XLA interfaces. Executable, TransferManager, and AllocationTracker now use ShapedBuffer to hold device memory addresses holding XLA data. Most of the change is straight-forward with the exception of AllocationTracker which was mostly rewritten (and simplified) and some refactoring in the CPU executable. Also, have ShapedBuffer hold on-host and on-device Shapes which are the shapes of the representation of the data on the host and device, respectively. This is necessary because with cl/178624364 the on-host and on-device shape may no longer be equal. -- 179265385 by A. Unique TensorFlower: Return error rather than CHECK fail in Executable::ExecuteOnStreamWrapper -- 179264551 by dandelion: Internal fixes. -- PiperOrigin-RevId: 179277894
*	[TF:XLA] Clean up unused XLA options and functions.	A. Unique TensorFlower	2017-11-10
\| \| \| \|	PiperOrigin-RevId: 175217850
*	Remove "hybrid" HloModuleConfig option. The option was used to generate ↵	Mark Heffernan	2017-10-04
\| \| \| \| \| \| \| \|	executables which only generated the array values of tuple-shaped outputs, not the tuple index tables.. With cl/170133015, ShapedBuffers which hold the computation output now have materialized tuples with these index tables so this option is no longer desired or necessary. No functional change. Just cleanup. PiperOrigin-RevId: 171035738
*	Add methods to convert between Literals and ShapedBuffers to LocalClient. ↵	Mark Heffernan	2017-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \|	These "conversion" methods copy the data to/from the device into/from literals. Also fix various issues I noticed along the way: * Move LocalClient tests into open source. * Add proper == operator to Literals * Add << overload for streaming Literals to output. * Add Literal::GetSubliteral methods. * Remove unused AllocatBufferOnDevice methods from LocalClient and LocalService. PiperOrigin-RevId: 169606342
*	[TF:XLA] Use HloEvaluator for ComputeConstant, remove the need of a dedicated	Kay Zhu	2017-08-10
\| \| \| \| \| \|	compute constant backend. PiperOrigin-RevId: 164940970
*	[XLA] Refactor CreateModuleConfig to share code between multiple call-sites.	Eli Bendersky	2017-07-31
\| \| \| \| \| \| \| \|	Previously Service, LocalService and CompileOnlyService had their own code to create a new HloModuleConfig, with much repetition (and some ommissions); collect all these uses in a single method. PiperOrigin-RevId: 163766869
*	[XLA] Get rid of ServiceFlags by absorbing it into DebugOptions.	A. Unique TensorFlower	2017-07-18
\| \| \| \| \| \| \|	After this change HloModuleConfig::hlo_profiling_enabled_ is redundant. I'll remove it in a future change. PiperOrigin-RevId: 162436163
*	[XLA] Remove dead "in-client" code.	Mark Heffernan	2017-06-21
\| \| \| \| \| \| \| \|	Remove Service::runs_in_client_process_ field and it's dead user. This was previously used by the "InProcess" methods which have been replaced with the LocalClient API. PiperOrigin-RevId: 159759455
*	[XLA] Remove unused factory in local_service	Eli Bendersky	2017-06-21
\| \| \| \|	PiperOrigin-RevId: 159712806
*	[XLA] Move replica_count out of Backend.	Eli Bendersky	2017-06-20
\| \| \| \| \| \| \| \| \|	This is an intermediate step in making replica_count more explicitly programmable (rather than set by a flag). There is no reason for the Backend to hold replica_count - it was only holding it as a container with no additional semantics. PiperOrigin-RevId: 159626528
*	[XLA:CPU] Thread-parallel CPU backend (work in progress).	A. Unique TensorFlower	2017-06-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	) Partitions HLO instructions along outer dimensions, based on simple cost model. ) Emits loop nests with dynamic outer loop bounds (for partitions), leaves inner loop bounds static (for optimizations). *) Dispatches parallel tasks on thread pool for execution. Simple element-wise fusion benchmark: CPU: Intel Sandybridge with HyperThreading (16 cores) dL1:32KB dL2:256KB dL3:20MB Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------------------- BM_ParallelFusion/T1 16821490 16740939 100 237.791MB/s BM_ParallelFusion/T2 9175467 17826232 100 435.945MB/s BM_ParallelFusion/T4 5106019 18875761 100 783.389MB/s BM_ParallelFusion/T8 2833598 19624622 233 1.379GB/s BM_ParallelFusion/T16 1995259 26541594 344 1.958GB/s Performance on some select model benchmarks (more work needed is needed here, but wanted to get this CL in and iterate). Benchmark runs with 16 threads and wall time reported in seconds. InceptionResnetV2.inception_resnet_v2_200x200x20x1000_inference_xla_cpu wall_time(old): 7.97818803787 wall_time(new): 4.328297019 InceptionV3.inception_v3_200x200x20x1000_inference_xla_cpu wall_time(old): 2.96792650223 wall_time(new): 1.21296644211 InceptionResnetV2.inception_resnet_v2_200x200x20x1000_training_xla_cpu wall_time(old): 42.0342495441 wall_time(new): 17.9182584286 InceptionV3.inception_v3_200x200x20x1000_training_xla_cpu wall_time(old): 6.99778497219 wall_time(new): 3.95318603516 BenchmarkRNN.rnn_basic_lstm_64x512_4x20_xla_cpu_forward wall_time(old): 11.869822979 wall_time(new): 7.89778208733 BenchmarkRNN.rnn_basic_lstm_64x512_4x20_xla_cpu_forward_backward wall_time(old): 38.1911079884 wall_time(new): 29.8181960583 PiperOrigin-RevId: 159474444
*	[XLA] Remove gpu/gpu_backend-specific flags	Eli Bendersky	2017-06-16
\| \| \| \| \| \| \| \|	Move useful flags into debug_options, and leave some less used flags out - they can be propagated through debug_options if required (for now there's too much duplication between them and what's already inside) PiperOrigin-RevId: 159261661
*	Add ComputationPlacer to assign device ids for replicated model-parallel ↵	HyoukJoong Lee	2017-06-14
\| \| \| \| \| \|	computations. PiperOrigin-RevId: 159056198
*	[XLA] Simplify Shape traversal visitors.	Mark Heffernan	2017-06-06
\| \| \| \| \| \|	Simplify shape traversal visitors in ShapeUtil and ShapeTree. Add a non-Status form because most uses of the traversal methods do not use it, and remove is_leaf parameter from ShapeTree.ForEach* as it is not frequently used. PiperOrigin-RevId: 158201574
*	[XLA:CPU] Prep work for thread-parallel XLA CPU backend.	A. Unique TensorFlower	2017-05-12
\| \| \| \| \| \| \| \| \|	) Plumbs intra op thread parallelism value through to XLA backend. ) Service execution uses inter/intra op pools from backend. *) LocalService execution uses intra op pool from backend for XLA parallel ops, and intra op pool passed in ExecutableRunOptions for eigen ops. PiperOrigin-RevId: 155891730
*	[XLA] Improve the documentation of our service/client classes a bit	Eli Bendersky	2017-05-11
\| \| \| \| \| \|	Also kill dead code PiperOrigin-RevId: 155814264
*	Refactor XLA's CompileAheadOfTime out of LocalClient into a new ↵	A. Unique TensorFlower	2017-05-05
\| \| \| \| \| \| \|	CompileOnlyClient class, and likewise from LocalService into a new CompileOnlyService class. This also renames AheadOfTimeComputationInstance to AotComputationInstance for consistency with AotCompilationResult and AotCompilationOptions in compiler/xla/service/compiler.h. Change: 155252320
*	Set hlo_profile on module_config if xla_hlo_profile flag is given.	Jacques Pienaar	2017-03-21
\| \| \| \|	Change: 150817673
*	[XLA] Add support for dumping computations during CompileAheadOfTime. Remove ↵	Peter Hawkins	2017-03-15
\| \| \| \| \| \|	'/' and '\' characters from path names of dumped graphs. Change: 150231912
*	[XLA] Remove LocalClient::ExecuteLocally(), in lieu of ↵	Peter Hawkins	2017-03-07
\| \| \| \| \| \|	LocalClient::Compile() and LocalExecutable::Run(). Change: 149482633
*	[TF:XLA] Remove support for client-allocated result buffers.	Peter Hawkins	2017-03-07
\| \| \| \| \|	This code path is unused; Tensorflow ended up settling on having XLA allocate result buffers using Tensorflow's allocator. Remove it to reduce the proliferation of ExecuteXYZ() methods. Change: 149423775
*	[XLA:GPU] Cache GPU substreams across executions	A. Unique TensorFlower	2017-03-02
\| \| \| \|	Change: 149063035
*	[XLA] Properly version outfeed and send operations in UserComputation.	Mark Heffernan	2017-02-21
\| \| \| \| \| \| \| \| \| \|	Previously outfeed and send operations were unconditionally emitted during UserComputation lowering even if the outfeed/send was not in the requested version (computation snapshot). This CL versions these operations. Also, opportunistically improve logging in UserComputation, Service, and ComputationTracker which was used to root cause the underlying bug. Change: 148170893
*	[XLA] Use `Pool<se::Stream>` as stream cache in backend, and use smart ↵	A. Unique TensorFlower	2017-02-15
\| \| \| \| \| \|	pointers rather than explicitly release acquired streams Change: 147620836
*	Set the enable_hlo_profiling flag in HloModuleConfig.	HyoukJoong Lee	2017-02-08
\| \| \| \|	Change: 146981790