| Commit message (Collapse) | Author | Age |
... | |
|/ / /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This fix tries to address the issue raised in 2075 where
there was no implementation of `tf.unravel_index`.
The `tf.unravel_index` could be quite useful in many places.
This fix adds the `tf.unravel_index` in CPU kernel. Note `order`
in `np.unravel_index` has not been added yet.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
* Disable AWS S3 virtual addressing
This fix is related to 16397 and 15159. The fix disables
the virtual addressing of AWS S3, as was suggested in the comment.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
* Fix format issue.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
* Add comment for the passed parameter of virutal addressing.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 183662473
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 183661140
|
| | |
| | |
| | |
| | | |
* fix typos
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Other than the root and parameters of a fusion computation, most other
instructions in a fusion computation don't have a layout. GTEs are an
exception; they should inherit their layout from their operand, which
must be another GTE or a parameter.
Previously LayoutAssignment left GTEs alone, assuming they came in with
the right layout. But this isn't correct, and in fact LayoutAssignment
cleared the layouts of every non-fused instruction before assigning them
for exactly this reason. If we'd done the same to fused instructions,
it would have caught this bug, so we make that change here as well. (We
simplify this loop by removing the check for kOutfeed -- outfeeds do not
produce a result, so there's no shape to keep.)
PiperOrigin-RevId: 183595627
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
For example the batch-norm ops return a tuple, and those values' layouts
are significant. We still hide the layout on tuples, since this can be
noisy.
PiperOrigin-RevId: 183594622
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We create the ShapeVisitor once per pass pipeline. Without this change,
after our ShapeVisitor has checked an instruction, it will never again
check that instruction *or any of its transitive inputs*. Yikes.
PiperOrigin-RevId: 183593437
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 183558128
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 183551521
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
* Update docs for installing CUDA/CUDNN
This fix addresses the issue raised in 16479 where
CUDA/CUDNN versions from the docs do not match TensorFlow v1.5.0.
From the Dockerfile and docker images ENV, the version of CUDA/CUDNN
for TensorFlow v1.5.0:
```
CUDA_VERSION 9.0.176
CUDNN_VERSION 7.0.5.15
```
This fix updates the doc so that CUDA version is changed from `8.0` -> `9.0`,
CUDNN version is changed from `6.0` -> `7.0`.
This fix fixes 16479.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 183514731
|
| | |
| | |
| | |
| | |
| | |
| | | |
aliases.
PiperOrigin-RevId: 183495796
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 183493603
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 183491729
|
| | |
| | |
| | |
| | |
| | |
| | | |
tensorflow/contrib/tpu/profiler/capture_tpu_profile.cc.
PiperOrigin-RevId: 183486778
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 183479688
|
| | |
| | |
| | |
| | |
| | |
| | | |
collected.
PiperOrigin-RevId: 183474367
|
| | |
| | |
| | |
| | |
| | |
| | | |
be necessary any longer.
PiperOrigin-RevId: 183474194
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Thread a generator through the functions for creating fake arguments so the same
generator can be reused which avoids repeating the same data patterns for each
argument generated.
Also tweak the position-dependent biasing heuristic to create both positive and
negative numbers for small literals.
PiperOrigin-RevId: 183473588
|
| | |
| | |
| | |
| | |
| | |
| | | |
ConvDiagonalFB blocks.
PiperOrigin-RevId: 183472440
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
//third_party/tensorflow/contrib/{factorization:kmeans_test,linear_optimizer:sdca_estimator_test}
These tests were getting flaky timeouts when run under asan, sometimes taking
longer than the 5 minute timeout. Increasing the shard count to 4 seems to be
sufficient to cause them not to time out.
PiperOrigin-RevId: 183470183
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In a later change, the GPU backend will use this allocator to reserve
scratch memory when trying out different convolution algorithms during
compilation.
PiperOrigin-RevId: 183469579
|
| | |
| | |
| | |
| | |
| | |
| | | |
* updating CUDA srcs for Makefile build to fix unsatisfied link error
* more makefile refactoring
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The reduce precision support is cribbed from the CPU/GPU LLVM-emitted
implementation. The implicit broadcast pass removes any implicit broadcasts in
the module replacing them with the equivalent explicit broadcast and reshape
instructions.
PiperOrigin-RevId: 183467648
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 183467186
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 183466905
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 183465032
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 183463264
|
| | |
| | |
| | |
| | |
| | |
| | | |
Now with less build breakage!
PiperOrigin-RevId: 183458987
|
| | | |
|
| | | |
|
|\ \ \
| | | |
| | | | |
Branch 183446593
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
While tracking down the issue of timeouts when running THE ISOLATOR, it was
observed that NearComparator#ExpectLiteralsNear() could be optimized in the
case of matching layouts to not compute multi indexes.
In the process of tracking down timeouts in THE ISOLATOR, I had assumed that
time spent was dominated by either generating input data, executing the input
data on various backends, or comparing the data. Never assume you know where
the time is spent in a program; the profiler may surprise you.
After making that optimization and then profiling the code before and after, I
was surprised by the profile. Image the shock, horror, and disgust I
experienced when discovering that runs of THE ISOLATOR were dominated (45%) by
calls to Literal#ToString() in NearComparator#ExpectLiteralsNear() for huge
(>120 million elements) literals that failed comparisons. No wonder passing
shards of THE ISOLATOR were fast, and failing shards were slow.
Further, computing multi indexes many times is expensive enough (18%) to show
up in profiles, so avoid calculating it until it is necessary.
The optimizations in this patch:
* Don't call Literal#ToString() on huge literals that are going to get written
to disk anyways. The utility of printing said literal to stdout is suspect.
* Initialize NearComparator#miscompares_ to false, only update miscompares_ and
other stats when miscompare occurs.
* Split NearComparator#ExpectLiteralsNear() into two, since we only need to log
and update stats if an actual miscompare occurs.
* Add fast path in NearComparator#ExpectLiteralsNear() for case of matching
layouts, being careful not to compute multi index unless mismatch actually
occurs.
This optimized NearComparator#ExpectLiteralsNear() for the case of many element
literals, with few miscompares. For many miscompares, we cannot avoid
calculating multi indexes, but can fast path for equal layouts. For zero
miscompares, we can at least fast path in the case of matching layouts.
Before this CL, a run of THE ISOLATOR for a single literal with >120 million
elements and a few miscompares took 379s (6.3m). With this CL, the same test
case now takes 44s.
Beautiful flame graphs omitted from public commit message, regrettably.
PiperOrigin-RevId: 183451138
|
| | | |
| | | |
| | | |
| | | | |
PiperOrigin-RevId: 183450369
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The test was sometimes taking over six minutes to run in asan mode,
causing it to hit the 5 minute timeout. Setting the shard count
to 6 was inufficient, but setting it to 10 brought the runtime
down to about 3:30 in the worst case over 100 runs.
PiperOrigin-RevId: 183449941
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
vulnerability reporting process.
PiperOrigin-RevId: 183448435
|
|/| | |
| |/ / |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
1. Using _shape_tuple
2. Bypassing * over math_ops.mul etc
3. Flatmaps in the tape code
4. Cache for ones similar to for zeros
5. Fast path for _SubGrad
6. Fast global_step += 1 for resource variables
7. Bypassing deprecated args decorator in eager mode
PiperOrigin-RevId: 183446593
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This makes chaining them easier. Control dependencies to ensure updates
happen are implicitly added by the function code.
PiperOrigin-RevId: 183446211
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 183443656
|
| | |
| | |
| | |
| | |
| | |
| | | |
did not exist in the external github TF repository.
PiperOrigin-RevId: 183443347
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This fix fixes a build failure when compiling with
GCC 7.2.1 on AWS Linux 2:
```
gcc version 7.2.1 20170915 (Red Hat 7.2.1-2) (GCC)
```
The eror output was:
```
...
./tensorflow/contrib/lite/toco/model.h:1567:25: error: 'std::function' has not been declared
void EraseArrays(std::function<bool(const string&)> discardable) {
.....
```
This fix is related to 16046.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 183441321
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 183438398
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
* Change `reduce_logsumexp` to internally use `reshape` rather than `squeeze`
since the latter requires the `axis` arg to be a Python `list`.
PiperOrigin-RevId: 183396533
* Kernel utils to support broadcast add and mul.
PiperOrigin-RevId: 183397494
* Updating sparsify_gather.
PiperOrigin-RevId: 183402917
* [tf.data] Move slow-path-related code into the slow path in IteratorHandleOp::Compute().
This slightly reduces the amount of work performed when an iterator is accessed (after the first access), and potentially reduces contention if concurrent steps are accessing the same iterator.
PiperOrigin-RevId: 183406221
* Cleanup: Ran clang-format on all *.{cc,h} in under grappler.
PiperOrigin-RevId: 183406440
* Increase shard count of //third_party/tensorflow/python:nn_batchnorm_test to avoid timeouts
When run under asan, the test runs for about 5 minutes, and sometimes
longer, causing frequent timeouts.
This change increases the shard count of the test to 4, which brings the run time
of the longest running shard under asan to about 2 minutes.
PiperOrigin-RevId: 183414888
* Add available choices to toco flags and fix minor formatting issues.
PiperOrigin-RevId: 183415713
* Performance improvements to some GPU code to use shared locks instead of unique locks for some hotspot cases.
PiperOrigin-RevId: 183418559
* [XLA] Improve error message for bad slices.
PiperOrigin-RevId: 183420038
* Fix py3 build rules for all py tests under py2tf.
PiperOrigin-RevId: 183422144
* Fix bug with Operation._control_inputs setter.
PiperOrigin-RevId: 183422192
* Make softmax_op_test.py work with C API enabled.
PiperOrigin-RevId: 183422829
* Cleanup: Ran clang-format on all *.{cc,h} files in tensorflow/core/kernels.
PiperOrigin-RevId: 183423961
* Fix the documentation for the dense layer for how rank > 2 inputs are handled.
PiperOrigin-RevId: 183425868
* Cleanup: Ran clang-format on all *.{cc,h} in tensorflow/core/ops.
PiperOrigin-RevId: 183429339
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 183435438
|
| | |
| | |
| | |
| | |
| | |
| | | |
inputs with equal distributions.
PiperOrigin-RevId: 183435084
|
| | | |
|
| | |
| | |
| | |
| | | |
PiperOrigin-RevId: 183431139
|