| Commit message (Collapse) | Author | Age |
... | |
|
|
|
| |
Change: 127753127
|
|
|
|
|
|
| |
couldn't find base<> template when creating custom filegroup to reduce binary size.
Change: 127751354
|
|
|
|
|
| |
Update miscellaneous files to allow the vz- prefix.
Change: 127750985
|
|
|
|
|
|
|
|
|
|
| |
- LookupTableFind - also validated default_value.
- LookupTableInsert - also added validation for keys and values compatible.
- LookupTableSize
- LookupTableExport - also made the two outputs share the first dim.
- InitializeTable
- InitializeTableFromTextFile
Change: 127750785
|
|
|
|
| |
Change: 127749101
|
|
|
|
| |
Change: 127747773
|
|
|
|
|
|
| |
transposition with a shape tensor, all static information is lost. This change
preserves shape if static shape is fully defined.
Change: 127746146
|
|
|
|
|
| |
Make shape_inference::InferenceContext::Add support adding two dimension.
Change: 127744929
|
|
|
|
| |
Change: 127737446
|
|
|
|
|
|
|
| |
Increased the number of samples taken, as well as tweaked the initial
parameters and tolerance, such that the test rarely fails (if I remove the
seed, the tests fail > 50% of the time).
Change: 127731468
|
|
|
|
| |
Change: 127728304
|
|
|
|
|
| |
training_ops.cc.
Change: 127727818
|
|
|
|
|
|
| |
This triggers when save_path is a relative path pointing to a file named
"checkpoint", but does not trigger when it is an absolute path.
Change: 127727572
|
|
|
|
|
|
| |
_WITH_TENSORS macros in shape_inference_testutil.h. Instead, pass def and
tensors through a new ShapeInferenceTestOp struct.
Change: 127715692
|
|
|
|
| |
Change: 127715478
|
|
|
|
|
|
|
|
|
|
| |
- Change ExecutorState::Entry to construct the Tensor val late, avoiding
default construction and moving the tensor in favor of calling the
constructor directly.
- Change DeviceContextMap to be vector, and make the check cheaper when no
nodes are registered.
- Cache in Params whether the device requires registering tensor accesses.
Change: 127714854
|
|
|
|
| |
Change: 127709092
|
|
|
|
| |
Change: 127682508
|
|
|
|
| |
Change: 127668670
|
|
|
|
| |
Change: 127638036
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Send/Recv paths:
o Allow SendOp and RecvOp implementations to directly use the string buffer
contained in a Rendezvous::ParsedKey object, rather than allocating their
own string object. Saves two allocations per Send/Recv pair.
o Use std::move in a few places to avoid copying a std::function object.
o Eliminated unused ParsedName variable declaration in
DeviceNameUtils::ParseLocalName.
Change: 127630066
|
|
|
|
|
| |
avoid creating a new one each time.
Change: 127624630
|
|
|
|
| |
Change: 127608996
|
|
|
|
| |
Change: 127606165
|
|
|
|
|
| |
This is especially useful for signaling scenarios where the chief worker wants to signal the other workers to early stop or execute some action.
Change: 127603432
|
|
|
|
|
| |
based on the nodedef and opdef.
Change: 127602218
|
|
|
|
| |
Change: 127601718
|
|
|
|
|
|
|
| |
There seems to be flakiness in this comparison. I believe it's a bug in the
while_loop gradients or in a missing control flow dependency in TensorArray
gradients.
Change: 127598540
|
|
|
|
| |
Change: 127597496
|
|
|
|
| |
Change: 127597133
|
|
|
|
|
|
| |
Enables optimization passes in DirectSession.
Refactors SimpleClientGraph and SimpleGraphExecutionState to maintain a separate mutable function library per client graph.
Change: 127595477
|
|
|
|
|
|
|
| |
bounding boxes, greedily selecting a subset of bounding boxes in descending order
of score, and pruning away boxes that have high intersection-over-union (IOU) overlap
with previously selected boxes.
Change: 127595411
|
|
|
|
|
| |
- The method could have false positives but no false negatives.
Change: 127595141
|
|
|
|
|
|
|
|
| |
- MatMulShape
- BiasAddShape
- NoOutputs
- ScalarShape
Change: 127592611
|
|
|
|
|
|
| |
SDCAOptimizer is supported as the underlying optimizer for hinge_loss.
Change: 127590598
|
|
|
|
|
|
| |
tensorflow/contrib/session_bundle.
Change: 127590512
|
|
|
|
|
| |
Avoids dynamic batch/non-batch dispatching when it's not necessary. Also possibly works around a bug in cond(), but that's an issue for another CL.
Change: 127588880
|
|
|
|
|
|
| |
TESTED:
- passed opensource_build: http://ci.tensorflow.org/job/tensorflow-cl-presubmit-multijob/2780/
Change: 127585603
|
|
|
|
|
| |
method binding error.
Change: 127585524
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Break strided_slice op into multiple translation units by dimension
- Refactor code to allow instantiation from multiple translation units
by using a free function instead of a class member
- strided_slice_op.cc used to take 163-180s. Now the aggregate of all
translation units is 196s but each strided_slice_op_inst_* takes
between 24-34s each
- cast to a canonical type for each POD size to avoid redundant
instantiations for say int32 and float32.
Change: 127578995
|
|
|
|
|
|
|
| |
Documentation does not seem to be rendered properly.
All other uses of ```python have a blank line before them.
Change: 127574769
|
|
|
|
| |
Change: 127570597
|
|
|
|
| |
Change: 127568733
|
|
|
|
|
|
|
| |
component, tf-chart-scaffold, responsible for this.
Update the tf-line-chart component to receive data from tf-chart-scaffold instead of talking to the backend directly.
Change: 127568346
|
|
|
|
|
| |
(numpy arrays or tf Tensors), IndexedSlices, and TensorArrays.
Change: 127567260
|
|
|
|
| |
Change: 127565651
|
|
|
|
|
|
|
| |
as a reshape.
This is a performance optimization that is important for mobile inference graphs. They typically run with single examples (batch_size=1) and so many operations are performed on Tensors with only 1 non-1 dimension and these are just a reshape. In performance measurements on a Nexus5, this reduced times spent in transpose nodes from around 1ms to 0.02ms for a 1x20000 tensor.
Change: 127564469
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
unittests.
This uncovered:
- a few flaky tests, that started failing deterministically :). Their fixes are included
in this CL.
- random_shuffle_queue with a shared_name always fails if a user provides a graph-level
random seed. This is because the op_level seed returned from random_seed.get_seed
changes per graph operation. The fix for this is also included in this CL.
While I was here, also added a test for random_seed, since the logic in there has
become a bit complicated.
Change: 127564438
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
EXPERIMENTAL: Insert special debug ops (e.g., DebugIdentity) to graph for debugging. Currently, debug ops need to take exactly one input and has the string attribute "tensor_name" to indicate what tensor it watches.
For example, before the node insertion, the graph may look like:
A:0 -----------1----------> B
|
---------2-----------> C
wherein the output slot 0 of node A feeds as the input to nodes B through
edge 1 and to node C through edge 2.
After the node insertion, assuming both B and C have non-Ref input, the graph becomes:
A:0 ---3---> Copy -----------4----------> B
|
---------5--------> C
|
---------6--------> X
If a node (e.g., B) has Ref input, the graph becomes:
----------------4---------------> B
|
A:0 ---3-----> Copy -----------5----------> C
|
-----------6--------> X
In other words, we do not feed Refs to deep-copies to downstream nodes.
The Copy node is the inserted deep-copy node that copies the input tensor on-device (e.g., CPU-to-CPU or GPU-to-GPU deep copy) that reduces the likelihood of racy updates during debug tensor-watching. X is the newly created debug node that transforms the input (copy of the watched tensor) into a debug signal.
DebugIdentity is the simplest debugging paradigm, in which the debug signal (i.e., X:0) equals the tensor itself. More sophisticated debug ops can be used to transform the tensor into other useful debug signals. An example is the added DebugNanCounter op.
If the nodes (A, B and C) are located on GPU and the edges from A to B or C is HOST_MEMORY, the CopyHost op will be used instead of the Copy op.
A reserved string attribute "debug_url" is created for the debug ops to make it possible to send debug signals to files or RPC calls in the future.
Other points worth noting:
* The debug ops have control-edge connections to the original destination node, in order to ensure that the debug signals are deterministically generated before the destination node executes.
* More than one debug ops can be added to watch a tensor.
* A new field called "DebugTensorWatch" is added to RunOptions to support debug node insertion.
* A new method GPUUtil::CopyGPUTensorToSameGPU has been added to make GPU-to-GPU deep-copy of tensors possible.
* The two test files (debug_gateway_test.cc and debug_gateway_gpu_test.cc) have been consolidated to the former, by using the GOOGLE_CUDA macro.
Change: 127562075
|
|
|
|
|
|
|
| |
tensorflow/contrib/session_bundle) and introduce forwarding headers instead.
This doesn't change any behavior since the copies are identical, but moves forward with the migration, avoids code duplication, and eliminates the possibility of this code diverging.
Change: 127556144
|