| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
| |
verifies correct error termination when first RecvBuf
request fails.
PiperOrigin-RevId: 210993543
|
|
|
|
| |
PiperOrigin-RevId: 210991606
|
|
|
|
| |
PiperOrigin-RevId: 210984489
|
|
|
|
|
|
| |
This prevents TFE from trying and failing to run them on GPUs.
PiperOrigin-RevId: 210975652
|
|
|
|
|
|
|
|
|
| |
Changing dtype to any other type other than default will cause a crash because
decode_jpeg or decode_image will promise to return uint8 all the time while
decode_raw will actually vary its return type. This mismatch of types causes
tf.case to fail and makes dtype parameter unusable.
PiperOrigin-RevId: 210975290
|
|
|
|
| |
PiperOrigin-RevId: 210975274
|
|
|
|
|
|
|
| |
Besides just general resilience to general user code, another motivation is that it
still makes sense to use the hook when there are no summaries in the graph for the side effect of writing out the graph summary.
PiperOrigin-RevId: 210975165
|
|
|
|
|
|
| |
The replacement for the initializer_list overload is a bit sad because MakeSpan doesn't understand initializer_list (and we don't have CTAD yet)
PiperOrigin-RevId: 210974939
|
|
|
|
|
|
| |
Previously, if neither the --debug or --tensorboard_debug_address flag is used, the example will error out because the variable debug_hook is unset.
PiperOrigin-RevId: 210973500
|
|
|
|
|
|
|
| |
This is necessary now that outfeed instructions return a token[]; otherwise
users of said token can't find it via GetEmittedValueFor.
PiperOrigin-RevId: 210970118
|
|
|
|
| |
PiperOrigin-RevId: 210965673
|
|
|
|
|
|
| |
Queue runners will be removed in TensorFlow 2.0. They have been replaced with `tf.data` input pipelines, which provide a more efficient version of the same functionality.
PiperOrigin-RevId: 210964268
|
|
|
|
|
|
| |
To avoid potentially narrowing issues.
PiperOrigin-RevId: 210961574
|
|
|
|
| |
PiperOrigin-RevId: 210957416
|
|
|
|
| |
PiperOrigin-RevId: 210956850
|
|
|
|
| |
PiperOrigin-RevId: 210956690
|
|
|
|
|
|
|
| |
This will allow the functional tf.while_loop proposed in https://github.com/tensorflow/community/pull/13 to achieve feature parity with the current implementation.
Lowering is performed only when the "_lower_using_switch_merge" attr is set to True.
PiperOrigin-RevId: 210956432
|
|
|
|
|
|
| |
Without this change, CollectiveOps in functions cannot execute.
PiperOrigin-RevId: 210955255
|
|
|
|
| |
PiperOrigin-RevId: 210954903
|
|
|
|
| |
PiperOrigin-RevId: 210950778
|
|
|
|
| |
PiperOrigin-RevId: 210950150
|
|
|
|
| |
PiperOrigin-RevId: 210948369
|
|
|
|
| |
PiperOrigin-RevId: 210947198
|
|
|
|
|
|
|
|
| |
There are several API migrations happening:
* ArraySlice's sub-slice constructor => .subspan
* MutableArraySlice's container pointer constructor => absl::MakeSpan
PiperOrigin-RevId: 210946124
|
|
|
|
| |
PiperOrigin-RevId: 210945714
|
|
|
|
| |
PiperOrigin-RevId: 210934704
|
|
|
|
|
|
|
|
|
|
| |
SparseSoftmaxCrossEntropyWithLogits
See https://github.com/tensorflow/tensorflow/blob/065f9b833ffbb3b2f03d63febb186275674ba133/tensorflow/python/ops/nn_grad.py#L482
Should help with #20218
PiperOrigin-RevId: 210933185
|
|
|
|
|
|
| |
This closes one API hole between TensorList and TensorArray
PiperOrigin-RevId: 210932049
|
|
|
|
|
|
| |
than the storage limit.
PiperOrigin-RevId: 210930360
|
|
|
|
|
|
| |
The COMPUTE_CAPABILITES var gets set during our build scripts, and the default here probably doesn't actually work if left alone.
PiperOrigin-RevId: 210929650
|
|
|
|
| |
PiperOrigin-RevId: 210929192
|
|
|
|
| |
PiperOrigin-RevId: 210928667
|
|
|
|
| |
PiperOrigin-RevId: 210927561
|
|
|
|
| |
PiperOrigin-RevId: 210927458
|
|
|
|
| |
PiperOrigin-RevId: 210925718
|
|
|
|
|
|
| |
to make it available to other fusion passes.
PiperOrigin-RevId: 210909961
|
|
|
|
| |
PiperOrigin-RevId: 210897377
|
|
|
|
|
|
| |
resulting reduce kernel suffers from poor data locality.
PiperOrigin-RevId: 210894866
|
|
|
|
|
|
|
|
| |
pass-specific classes: LayoutsAreReduceInputFusionFriendly
This is the first of a series of CLs. The idea is to extract and generalize IsFusable... logic scattered across GpuInstructionFusion, FusionMerger, and GpuMultiOutputFusion into the joint util. Eventually, functions like LayoutsAreReduceInputFusionFriendly will no longer be directly exposed, but rather used internally to determine fusibility of given ops.
PiperOrigin-RevId: 210888913
|
|
|
|
| |
PiperOrigin-RevId: 210870855
|
|
|
|
| |
PiperOrigin-RevId: 210870412
|
|
|
|
|
|
| |
variable_scope. So that it doesn't assume it is called in the "root" scope of variables.
PiperOrigin-RevId: 210866643
|
|
|
|
|
|
| |
tensorflow.bzl
PiperOrigin-RevId: 210859798
|
|
|
|
| |
PiperOrigin-RevId: 210857377
|
|
|
|
|
|
| |
arguments to the constructor to remove some boilerplate.
PiperOrigin-RevId: 210855509
|
|
|
|
| |
PiperOrigin-RevId: 210855008
|
|
|
|
|
|
|
|
| |
references.
+ Capitalize default version hint.
PiperOrigin-RevId: 210846291
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GPU memory allocation can be done in one of two modes: efficient (but
complex and therefore somewhat risky) or conservative (simpler, but less
efficient). The main difference is that 'efficient' allocation allows
the same memory area to be allocated to mutiple independent uses
simultaenously, when it should be the case that those uses will in
fact be serial and thus temporally disjoint, while 'conservative'
allocation will always obey the invarient that one piece of memory is
allocated to at most one use at any point in time.
If GPUDevice::RequiresRecordingAccessedTensors() returns false, then
the TF runtime uses efficient memory allocation for GPU ops. That is, GPU
ops are nominally synchronous and their tensor Ref's are deleted
immediately after the ops returns although really the corresponding GPU
kernel is only guaranteed to have been enqueued on the compute stream
and may not have yet begin execution.
If RequiresRecordingAccessedTensors() returns true, then conservative
memory allocation is used, i.e. Refs on the tensors accessed by a GPU op
are held until the corresponding kernel is guaranteed to have completed
execution and no part of the op will touch them again.
Efficient GPU memory allocation should be safe when the following criteria
are all met:
1. All GPU kernels are executed serially on a single compute stream.
2. All GPU kernel outputs and temp buffers are allocated by
the GPU Op in the executor thread in which it is originally called.
3. Any read of a GPU tensor computed by a GPU kernel that is not
by another kernel on that same GPU first synchronizes on
the compute stream that produced it.
4. Any read by a GPU kernel of a value that was not produced by another
GPU kernel first synchronizes on the entity that produced it,
e.g. a copy stream.
5. All direct allocations of GPU memory that are not for kernel outputs
or temp buffers are conservative in duration.
6. Any use of directly allocated GPU memory that is not part of a kernel
execution first synchronizes on the compute stream to ensure that
any prior granted uses of the same region have expired before this new use.
These conditions together should be sufficient for safety, and
correspond to established practice, though it may be possible to
contrive other sets of rules that are also sufficient.
Collective Ops for GPUs are unusual in that they are async (as TF
Ops) and they can directly allocate GPU memory in CPU threads that are
asynchronous to the launching executor thread. This CL corrects a
couple of subtle misuse errors related to conditions 2 and 6.
PiperOrigin-RevId: 210841522
|
|
|
|
| |
PiperOrigin-RevId: 210836404
|
|
|
|
| |
PiperOrigin-RevId: 210832078
|