| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
| |
#8041
Change: 149202172
|
|
|
|
| |
Change: 149170109
|
|
|
|
|
|
|
| |
sub_makefile/android dir, and add libtensorflow_demo.so target to remove the necessity of using Bazel to build the demo.
Partial solution to #8059, followup will add gradle/Android Studio integration.
Change: 149167229
|
|
|
|
| |
Change: 149165553
|
|
|
|
|
|
| |
convolution to be false.
Change: 149161107
|
|
|
|
|
| |
SavedModel.
Change: 149158386
|
|
|
|
|
|
|
| |
Also Add a data download step.
Will need to sync with "get_started/monitors.md"
Change: 149158319
|
|
|
|
|
|
| |
If the argument is not provided, the time reversal is applied to all batch
entries the same: time is reversed from 0 to max_time for each entry.
Change: 149155863
|
|
|
|
|
|
| |
'adjust_brightness'. The current implementation (i.e. without clipping before conversion) introduces different behavior for images with different original data types, i.e. uint8 or float32. When converting back to the original data type, if the original type is uint8, there is a automatic clipping effect, since all underflow/overflow numbers will be constrained to [0, 255]. But if the original data type is float32, no clipping will happen.
Change: 149155199
|
|
|
|
| |
Change: 149152982
|
|
|
|
|
|
|
| |
Use this to allow loading reductions saved with older graphdefs.
Change GraphConstructor to not increase the version when importing, but instead take the min of all versions.
Change: 149152437
|
|
|
|
| |
Change: 149150316
|
|
|
|
| |
Change: 149149782
|
|
|
|
|
|
| |
to state that the input argument queue_closed_exception_types, if passed, should be of type 'tuple'.
Change: 149149270
|
|
|
|
|
|
| |
subgraphs that use a Merge node. Back edges for merges are
not on the graph when it is imported by GraphConstructor.
Change: 149145899
|
|
|
|
|
|
| |
remote_fused_graph_execute_op in order to cache shapes in RemtoeFusedGraphExecuteInfo
Change: 149143066
|
|
|
|
|
| |
g3doc/, now under examples/. Partial fix of #8029.
Change: 149142119
|
|
|
|
| |
Change: 149132162
|
|
|
|
|
| |
Also makes file upload retries use the common retry logic.
Change: 149131497
|
|
|
|
| |
Change: 149127634
|
|
|
|
| |
Change: 149118694
|
|
|
|
| |
Change: 149108576
|
|
|
|
| |
Change: 149105174
|
|
|
|
| |
Change: 149085750
|
|
|
|
| |
Change: 149084406
|
|
|
|
| |
Change: 149075765
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
every call to TF_CHECK_OK().
Speeds up a microbenchmark that is added in this change to
status_test.cc from 1.19 ns per TF_CHECK_OP (before the changes to
status.{h,cc}) to 0.587ns per TF_CHECK_OK (51% improvement).
Since Status::operator== method generates quite a lot of code, and we
now avoid calling that, code size is considerably smaller by about 352
bytes per TF_CHECK_OK. The size of BM_TF_CHECK_OK routine in
status_test.cc is reduced from 699 bytes to 347 bytes, as measured by
nm --print-size --radix=d ...status_test_binary... | grep BM_TF
Change: 149073899
|
|
|
|
|
|
|
| |
will enable Android Studio builds of the demo on Windows machines in follow-up changes.
Note that the resulting .so file is currently armeabi-v7a only and 43mb, which will also be optimized in later CLs.
Change: 149073764
|
|
|
|
| |
Change: 149072356
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cublas 8 adds the cublasGemmEx function, which lets you specify an
explicit "algorithm" for the computation. This functions as an opaque
tuning hint to cublas.
This patch adds support for cublasGemmEx to StreamExecutor, and wires up
XLA's GemmThunk to use the new function.
This patch does not add GEMM autotuning support in TensorFlow proper,
only XLA.
Change: 149068961
|
|
|
|
| |
Change: 149066697
|
|
|
|
| |
Change: 149063035
|
|
|
|
| |
Change: 149062390
|
|
|
|
| |
Change: 149060929
|
|
|
|
| |
Change: 149060568
|
|
|
|
|
|
| |
when range_given is false.
Change: 149059522
|
|
|
|
|
| |
Clean up matmul gradient test code and improve test coverage.
Change: 149057186
|
|
|
|
| |
Change: 149051396
|
|
|
|
| |
Change: 149050553
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The choice of instruction ordering, and the minimization of fragmentation once
we've chosen an order, are two large inter-related factors wrt overall memory
usage. The approach in this CL uses heuristics to do better on both, but
neither problem is completely solved.
To pick a better an ordering (the larger factor), the approach is to try the
original list-scheduler based ordering, and to also try a DFS based ordering.
We pick the ordering that yields a smaller minimum memory, computed with the
simulated heap, ignoring fragmentation. Note that this is the absolute minimum
memory for a given ordering.
To minimize fragmentation, the approach is to run a heap simulation on temporary
buffers. We still try to re-use existing allocations when possible, but instead
of creating new allocations for temp buffers, we collect all the leftovers and
use a heap to pack them. The heap algorithm that gave the best results is "lazy
best-fit"; a variant of traditional best-fit that sometimes delays offset
assignment until Free is called, in the hopes of yielding larger free chunks.
Here's some measurements of the temp buffer sizes for GNMT encoder training (a
stacked LSTM). Lower is better. I've tried various combinations of instruction
ordering and heap simulation, to show the joint impact of these two factors.
List-scheduler order, no heap simulation 33.33GiB
List-scheduler order, with heap simulation 25.09GiB
Minimized DFS order, no heap simulation 16.59GiB
Arbitrary DFS order, no heap simulation 15.05GiB (old)
Arbitrary DFS order, with heap simulation 12.57GiB
Minimized DFS order, with heap simulation 11.71GiB (new)
Note that the original list scheduler order is much worse than DFS on stacked
LSTMs, but (not shown here) is much better than DFS on convolutions like
Inception. Also note that heap simulation packs things tighter for all
instruction orders in this example, but to varying degrees.
Change: 149049028
|
|
|
|
| |
Change: 149048750
|
|
|
|
| |
Change: 149047908
|
|
|
|
|
|
| |
when batch size is 1. This resulted in reducing the wall time spent in the op by 42% during testing on a Pixel phone.
Change: 149047766
|
|
|
|
|
| |
C++ microbenchmarks results are now normalized by iters (like python benchmarks).
Change: 149045367
|
|
|
|
| |
Change: 149043862
|
|
|
|
| |
Change: 149040924
|
|
|
|
| |
Change: 149040750
|
|
|
|
|
|
| |
element to contain wildcards.
Before we did wildcard expansion only if a single element is given.
Change: 149039100
|
|
|
|
|
|
| |
errors.
Change: 149038920
|
|
|
|
| |
Change: 149036604
|