| Commit message (Collapse) | Author | Age |
|
|
|
| |
PiperOrigin-RevId: 214824023
|
|
|
|
| |
PiperOrigin-RevId: 213771631
|
|
|
|
| |
PiperOrigin-RevId: 213764810
|
|
|
|
|
|
|
| |
The only tensorflow op that uses XlaSort is nn.top_k, so we add a test case
using nn.top_k.
PiperOrigin-RevId: 213763591
|
|
|
|
|
|
| |
Re-assigning unique IDs broke serialization of HloSchedule, and keeping IDs stable improves the fidelity of the proto serialization. This change requires that instructions in HLO module protos have valid, module-scope-unique ids so change the XLA builder to hand out module-scope-unique ids. Previously, instruction ids were only unique in the computation scope.
PiperOrigin-RevId: 212692339
|
|
|
|
| |
PiperOrigin-RevId: 212289067
|
|
|
|
| |
PiperOrigin-RevId: 211895566
|
|
|
|
|
|
|
|
| |
consistently
StringPiece is an alias for absl::string_view, InlinedVector is aliased to absl::InlinedVector. StrCat is compatible, so swapping it out is safe.
PiperOrigin-RevId: 211691840
|
|
|
|
|
|
|
|
| |
This prevents these build-time rules from accessing any GPUs which might
be present on the build machine and interfering with GPU tests which
might be running concurrently.
PiperOrigin-RevId: 211647681
|
|
|
|
|
|
| |
dependencies as well.
PiperOrigin-RevId: 211038094
|
|
|
|
| |
PiperOrigin-RevId: 210998142
|
|
|
|
| |
PiperOrigin-RevId: 210495040
|
|
|
|
| |
PiperOrigin-RevId: 210040583
|
|
|
|
| |
PiperOrigin-RevId: 210038492
|
|
|
|
|
|
|
| |
Unfortunately this has to be one big patch, because e.g. absl::StrCat
doesn't accept a TF StringPiece, but as soon as we switch to
absl::string_view, we have to switch away from all of the TF functions.
PiperOrigin-RevId: 209957896
|
|
|
|
| |
PiperOrigin-RevId: 209686671
|
|
|
|
|
|
| |
Same for WrapUnique.
PiperOrigin-RevId: 209531124
|
|
|
|
|
|
| |
This lets us remove XlaCompiledCpuFunction::args_ and some awkwardness from
XlaCompiledCpuFunction::Run.
PiperOrigin-RevId: 208309249
|
|
|
|
|
|
|
|
|
|
|
| |
The BufferInfo represents information about buffer assignment in
XlaCompiledCpuFunction. Arg sizes and temp sizes are now derived from
BufferInfo instead of being discrete sources of information.
Also made StaticData() private, tfcompile clients should not need to access it
directly.
PiperOrigin-RevId: 208283305
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As a follow-on cleanup for cl/206980796 ("Overhaul XLA:CPU's calling
convention.") I want to introduce a BufferInfo class that encapsulates whether a
buffer is a constant, an entry parameter or a temp without using the fragile
"size < 0" scheme I have today. To do this efficiently I need a place to put
the BufferInfo class that will be visible to MallocContiguousBuffers. Instead
of creating (what seemed to me) an odd layering with BufferInfo in aot/runtime.h
I decided to pull in the runtime into xla_compiled_cpu_function since that's the
only user.
PiperOrigin-RevId: 207333245
|
|
|
|
|
|
| |
And fix two lint issues.
PiperOrigin-RevId: 207051473
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
aligned_buffer_bytes in compiler/aot/runtime.cc was checking sizes[i] == -1 (as
opposed to checking sizes[i] < 0) to decide whether sizes[i] should count
towards the total size.
Original CL description:
Overhaul XLA:CPU's calling convention.
This CL introduces a clean separation between calls to "thread local" and
"global" computations in XLA:CPU.
Global computations are:
- kWhile body and condition computations
- kConditional true and false computations
- kCall callees
Parameters and results buffers for these calls are assigned a static
BufferAllocation::Slice by buffer assignment and so they don't require pointers
to result buffers and parameters to be explicitly passed in. In fact, passing
in result and parameters buffers is actively misleading because in cases like:
while_condition {
val = (s32[], pred[]) infeed()
ROOT result = get-tuple-element(val), index=0
}
there is no instruction explicitly copying the result of the computation into
the result buffer. Instead, it is up to the caller to pick up the correct
result buffer by asking buffer assignment (which would be buffer where infeed
wrote its second tuple component).
Thread local computations are all the other nested computations except fusion,
e.g. computations used by kMap and kReduce.
Parameters and result buffers for these calls are assigned a "thread local"
BufferAllocation::Slice which in XLA:CPU are mapped to allocas. Since these are
not static addresses, we *do* need to pass in parameter and result buffers. The
output is written to the result buffer by "allocating" the storage for the
root into the result buffer passed in by the caller.
There are two cleanup items that I kept off this CL to make reviews easier:
- We should rename "temps" to something more generic, like "buffer_table".
I'll do that in a followup CL.
- We should use GatherComputationsByAllocationType from buffer_assignment.cc to
CHECK that we use thread local calls for thread local callees and global
calls for global callees.
PiperOrigin-RevId: 206980796
|
|
|
|
| |
PiperOrigin-RevId: 206855848
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL introduces a clean separation between calls to "thread local" and
"global" computations in XLA:CPU.
Global computations are:
- kWhile body and condition computations
- kConditional true and false computations
- kCall callees
Parameters and results buffers for these calls are assigned a static
BufferAllocation::Slice by buffer assignment and so they don't require pointers
to result buffers and parameters to be explicitly passed in. In fact, passing
in result and parameters buffers is actively misleading because in cases like:
while_condition {
val = (s32[], pred[]) infeed()
ROOT result = get-tuple-element(val), index=0
}
there is no instruction explicitly copying the result of the computation into
the result buffer. Instead, it is up to the caller to pick up the correct
result buffer by asking buffer assignment (which would be buffer where infeed
wrote its second tuple component).
Thread local computations are all the other nested computations except fusion,
e.g. computations used by kMap and kReduce.
Parameters and result buffers for these calls are assigned a "thread local"
BufferAllocation::Slice which in XLA:CPU are mapped to allocas. Since these are
not static addresses, we *do* need to pass in parameter and result buffers. The
output is written to the result buffer by "allocating" the storage for the
root into the result buffer passed in by the caller.
There are two cleanup items that I kept off this CL to make reviews easier:
- We should rename "temps" to something more generic, like "buffer_table".
I'll do that in a followup CL.
- We should use GatherComputationsByAllocationType from buffer_assignment.cc to
CHECK that we use thread local calls for thread local callees and global
calls for global callees.
PiperOrigin-RevId: 206843794
|
|
|
|
|
|
|
|
|
| |
Plan is to move everything in xla/client/xla_client up to xla/client and remove
the directory.
No functional change.
PiperOrigin-RevId: 206055680
|
| |
|
|\ |
|
| |
| |
| |
| | |
PiperOrigin-RevId: 198949796
|
| | |
|
|\| |
|
| |
| |
| |
| | |
PiperOrigin-RevId: 197564506
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Fix issue #15588 by simplifying the code
The allocator.h code tried to be clever and use 32 byte alignment for SSE/AVX2/etc use,
and 64 byte alignment for AVX512.
Unfortunately, the #ifdef in use (from EIGEN) is not useful; the bazel BUILD files do
not propagate the tf_copts() compiler flags when the allocator.cc/allocator.h files get
compiled, to EIGEN does not see the actual AVX512 using compiler flags...
Rather than changing compiler flag propagation throughout a whole bunch of code,
there's an opportunity to just simplify the code and always use 64 byte alignment.
Yes it wastes a bit of space, but on the other hand now these allocations are
cache line aligned which isn't a bad thing... and an ifdef can be dropped
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
* Set EIGEN_MAX_ALIGN_BYTES=64
This patch sets a 64 byte upper bound on the alignment of memory allocated by
eigen. This is necessary to prevent crashes during the execution of the unit
tests when they are compiled with AVX512 support.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
* Update the tensorflow/compiler/aot tests for 64 byte alignment
Modifications to the tensorflow/core/framework/allocator.h to always
use 64 byte alignment causes failures in the tensorflow/compiler/aot
unit tests. This patch updates these tests so that they pass with
64 byte aligned allocated memory.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
* Update Tensor.Slice_Basic for 64 byte alignment
The test case
//tensorflow/core:framework_tensor_test:Tensor.Slice_Basic
fails with EIGEN_MAX_ALIGN_BYTES set to 64. The reason is that the
slices it takes of the sample tensor are 32 byte and not 64 byte
aligned. This commit increases one of the dimensions of the original
tensor to ensure that the slices taken by the test cases are indeed 64
byte aligned.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
* Update ScopedAllocatorConcatOpTest.Reshape for 64 byte alignment
The ScopedAllocatorConcatOpTest.Reshape test requires that the elements
of the field_shapes parameter of ExecOp are multiples of
Allocator::kAllocatorAlignment in size. If they are not, the backing
tensor allocated by PrepOp will have too many elements and reshaping
will fail. This commit modifies the test case, making the elements
64 bytes in size, the new value for Allocator::kAllocatorAlignment.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
|
|
|
|
| |
PiperOrigin-RevId: 196879933
|
|
|
|
| |
PiperOrigin-RevId: 196742598
|
|
|
|
| |
PiperOrigin-RevId: 196691101
|
|
|
|
|
|
| |
in reshaped form, instead allowing XLA devices to keep all tensors in a reshaped form outside an XLA computation.
PiperOrigin-RevId: 196683444
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Don't display ops with 0 optimal seconds and 0 actual cycles. These
are ops that were expected to be free and were actually free.
- Fix HloCostAnalysis to mark parameters, constants, and
get-tuple-element as expected-to-be-free per the definition above.
- Allow optimal-seconds < 0 to indicate "I don't know". Use this for
custom calls, and then hide such ops from the "seconds above the
optimum" table.
- Don't display "<none>" and "<unknown>" -- instead, just display the
empty string. Less visual noise.
- Instead of showing ~5 ops per category in the categories tables, show
everything. This isn't so noisy now that we're hiding "free" ops, and
it makes finding optimization opportunities much easier.
PiperOrigin-RevId: 196564177
|
|
|
|
| |
PiperOrigin-RevId: 195745718
|
|
|
|
|
|
|
|
|
|
| |
- xla::ComputationBuilder -> xla::XlaBuilder
- xla::ComputationDataHandle -> xla::XlaOp
- xla::Computation -> xla::XlaComputation
- xla::CompileOnlyClient::AotComputationInstance -> xla::CompileOnlyClient::AotXlaComputationInstance
- xla::SessionModule -> xla::HloSnapshot
PiperOrigin-RevId: 194874462
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL extends the --xla_hlo_profile knob to tfcompile. tf_library rules can
now set enable_xla_hlo_profiling to True to:
- Have the generated code update per-HLO profile counters as it executes.
- Have tfcompile generate and serialize an instance HloProfilePrinterData with
a compiled model that can be used to pretty-print the collected profile
counters.
PiperOrigin-RevId: 194627272
|
|
|
|
| |
PiperOrigin-RevId: 194031845
|
|
|
|
| |
PiperOrigin-RevId: 194010749
|
|
|
|
| |
PiperOrigin-RevId: 193929733
|
|
|
|
|
|
| |
This will require a version bump in workspace.bzl
PiperOrigin-RevId: 193052084
|
|
|
|
| |
PiperOrigin-RevId: 192770717
|
|
|
|
| |
PiperOrigin-RevId: 192768744
|
|
|
|
|
|
|
|
| |
//third_party/tensorflow/compiler/tf2xla/kernels.
Enable type DT_STRING for AssertOp and ConstOp, in order to make dummy Assert compile with a const string (assert message) as its input.
PiperOrigin-RevId: 192695938
|
|
|
|
| |
PiperOrigin-RevId: 190878279
|
|
|
|
|
|
|
|
| |
tensorflow::str_util equivalents.
This will allow the deprecated methods to be removed.
PiperOrigin-RevId: 190650553
|
|
|
|
| |
PiperOrigin-RevId: 190051589
|