| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
| |
The format used is as follows:
{{node <node_name>}}
PiperOrigin-RevId: 206370355
|
|
|
|
|
|
|
|
| |
This is simpler than the corresponding change to XLA:GPU because on XLA:CPU all
instructions are codegened so we can always embed a pointer to the constant
global variable directly in the generated LLVM IR.
PiperOrigin-RevId: 206363887
|
|
|
|
| |
PiperOrigin-RevId: 206362555
|
|
|
|
| |
PiperOrigin-RevId: 206361654
|
|
|
|
| |
PiperOrigin-RevId: 206354203
|
|
|
|
| |
PiperOrigin-RevId: 206352708
|
|
|
|
|
|
| |
No functional change.
PiperOrigin-RevId: 206352602
|
|
|
|
| |
PiperOrigin-RevId: 206347779
|
|
|
|
|
|
|
|
| |
It's only non-empty if we were able to run ptxas. If the PTX is going to be
JIT'ed by the driver it won't be around. Loading an empty cubin will result in
a fatal error.
PiperOrigin-RevId: 206341931
|
|\
| |
| |
| | |
PiperOrigin-RevId: 206341656
|
| |
| |
| |
| |
| |
| | |
XLA:GPU uses a custom-call with window/dim_labels to represent a call to
cudnn.
PiperOrigin-RevId: 206339219
|
| |
| |
| |
| | |
PiperOrigin-RevId: 206338966
|
| |
| |
| |
| | |
PiperOrigin-RevId: 206335619
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When not compiled with "--config=opt", or when compiling with "--config=opt --distinct_host_configuration=false" (to skip host-specific optimizations), the following code incurs casting overhead even when T == U,
y.reshape(rest_by_depth).device(d) = x_shifted.template cast<T>();
The fix: explicitly avoid calling cast<T>() if T == U.
PiperOrigin-RevId: 206332285
|
| |
| |
| |
| |
| |
| | |
When loading large tensors, the cost of creating a new BundleReader is small relative to the load time for the Tensor. When reading from network storage, using a threadpool for large tensor loads allows us to push expensive operations (alloc, fetch, checksum) to separate cores.
PiperOrigin-RevId: 206330021
|
| |
| |
| |
| | |
PiperOrigin-RevId: 206327963
|
| |
| |
| |
| | |
PiperOrigin-RevId: 206325816
|
|\ \
| | |
| | |
| | | |
PiperOrigin-RevId: 206325357
|
|\ \ \
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 206323345
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 206320196
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 206318440
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 206289143
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 206281287
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When running with multiple devices, using the wrong context will lead to
a check-fail when trying to set a stream that has been created with a different
context.
This resolves a check-fail on resnet50 with 8 GPUs.
PiperOrigin-RevId: 206274741
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
instead of Run(), to avoid leaving behind non-GC'ed state after model initialization.
PiperOrigin-RevId: 206266841
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 206265356
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
in eager execution (when possible).
Also move `build` implementation for subclassed networks from Model to Network (where it belongs) and slightly refactor it to minimize code duplication.
PiperOrigin-RevId: 206260286
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 206252639
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
building
Previously, the first Model build after load_weights (e.g. a predict()) would trigger restore ops, and any variables added later (e.g. slot variables from an added optimizer) would not be restored when graph building. This change makes behavior consistent between eager execution and graph building by running new restore ops as they come in.
PiperOrigin-RevId: 206251879
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 206249977
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 206249965
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 206245967
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This CL teaches XLA:GPU to use "normal" buffer assignment for constant
instructions. Constant instructions are mapped to a BufferAllocation, like all
other instructions, except the storage for this buffer is allocated statically
as a global in the generated PTX.
This CL does not change how we access the constants -- in
IrEmitterUnnested::BuildKernelThunk (used for top level computations) and in
HloToIrBindings::EmitBasePointersForHlos (used for nested computations) we bind
the kConstant instructions to the llvm::GlobalVariable backing them. So users
of constant instructions still access the globals corresponding to the constants
directly.
However, we no longer emit the constant literals inline. Instead we emit a
constant with a zero initializer and then memcpy in the contents of the literal
when we load the CUBIN/PTX. This works around compile time issues in LLVM and
ptxas caused by large constants.
We also populate `BufferAllocations` with the device pointers for the constant
globals. This is at least needed for TupleThunk today because TupleThunk wants
the addresses for the sub-buffers on the host. I'm not sure if there are other
places in XLA:GPU that rely on there being an entry in BufferAllocations for
every BufferAllocation.
PiperOrigin-RevId: 206243319
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
so that we do not require sample_weight to be set during training/eval
PiperOrigin-RevId: 206242625
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 206240947
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Eventually (when TuplePointsToAnalysis is removed), there will be only one implementation left.
Also, use early return instead of else-if to make the code less indented.
PiperOrigin-RevId: 206240067
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 206238991
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 206237934
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This change also switches `padded_batch_and_drop_remainder` to use the corresponding fused op.
PiperOrigin-RevId: 206236616
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 206236233
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 206235660
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 206235264
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 206224062
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
as the Cast op.
PiperOrigin-RevId: 206218592
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
AllUsersConsumeBF16() incorrectly used ValueTypeAfterChange() for the current value being checked, but it should be the original type.
Also fusion computation should be adjusted as soon as the fusion root is adjusted.
There was also redundant work for while computations. Now removed.
PiperOrigin-RevId: 206216822
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Only transpose and broadcast are valid. I think this used to work because we
didn't emit cublas calls for fused dots until recently.
PiperOrigin-RevId: 206213730
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 206211243
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This is safe because all ops which write to resource variables check whether
there are other outstanding references to the buffer and copy if that's the
case. So we can safely reuse the buffer of initializer tensors even in weird
cases such as initializing from a constant (which should never be mutated)
or using the same tensor to initialize multiple variables.
PiperOrigin-RevId: 206211065
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 206209252
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 206208637
|