| Commit message (Collapse) | Author | Age |
|
|
|
|
|
| |
benchmarking. At the moment, it returns a default config with only Grappler dependency optimizer disabled. Many benchmarks wrap the subgraph they want to time in control_flow_ops.group() to avoid including the overhead of copying the output back to the Python client in the measurement. In the graph, this only adds a control dependency between the subgraph output and the fetch node, which in turn (often) causes the dependency optimizer to turn all nodes in the graph into no-ops.
PiperOrigin-RevId: 216242463
|
|
|
|
|
|
|
|
| |
self.test_session() has been deprecated in 9962eb5e84b15e309410071b06c2ed2d6148ed44 as its name confuses readers of the test. Moving to cached_session() instead which is more explicit about:
* the fact that the session may be reused.
* the session is not closed even when doing a "with self.test_session()" statement.
PiperOrigin-RevId: 212766976
|
|
|
|
|
|
|
|
|
|
|
|
| |
This way users of the class don't have to remember to capture each one manually to avoid premature deallocation and memory races for asynchronous op kernels.
* Add simple tests that run multiple ops concurrently for linalg ops that use CudaSolver.
* Put a lock around the calls to cusolverDn*getrs and cusolverDn*gesvd, which appear not to be threadsafe.
* Misc. cleanup in linalg GPU kernels.
I ran all the related tests 1000 times without failure. Before this change, tests for matrix_solve and svd would fail or hang occasionally.
PiperOrigin-RevId: 170557380
|
|
|
|
| |
PiperOrigin-RevId: 165653891
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
from cuSolver and a hand-written matrix identity kernel, instead of the batched LU factorization from cuBlas, which is only suitable for small matrices.
Speedup measured on Titan X (Maxwell):
Shape adjoint Before After Speedup
------------------------------------------------------
(4, 4) noadjoint 0.000204 0.000193 5.3%
(16, 16) noadjoint 0.000360 0.000186 48.3%
(256, 256) noadjoint 0.013830 0.003852 72.1%
(1024, 1024) noadjoint 0.647639 0.015075 97.6%
(513, 4, 4) noadjoint 0.000219 0.000192 12.3%
(513, 16, 16) noadjoint 0.000293 0.000195 33.4%
(513, 256, 256) noadjoint 0.120573 0.120175 0.3%
(4, 4) adjoint 0.000201 0.000193 3.9%
(16, 16) adjoint 0.000282 0.000185 34.4%
(256, 256) adjoint 0.013028 0.003391 73.9%
(1024, 1024) adjoint 0.647752 0.014341 97.7%
(513, 4, 4) adjoint 0.000221 0.000197 10.8%
(513, 16, 16) adjoint 0.000384 0.000205 46.6%
(513, 256, 256) adjoint 0.131402 0.130616 0.6%
PiperOrigin-RevId: 164623298
|
|
|
|
|
|
| |
Add missing op grouping in Cholesky benchmark.
PiperOrigin-RevId: 164281947
|
|
|
|
| |
PiperOrigin-RevId: 156908010
|
|
|
|
|
|
|
|
| |
* Use cuBlas calls to implement a GPU version of the matrix_inverse op.
* Refactor CudaSolver to make all the kernel launches asynchronous, and add a public interface "RegisterLapackInfoCheckerCallback", which registers a callback to be invoked when the statuses of one or more solver kernels passed to it have been copied back to the host.
* Refactor CholeskyOpGpu to use the updated API and make that kernel async too.
PiperOrigin-RevId: 155418113
|
|
|
|
| |
Change: 142080137
|
|
|
|
|
|
|
|
|
|
|
|
| |
that now supports adjoint and batch matmul.
CL created by:
replace_string \
batch_matmul\\\( \
matmul\(
plus some manual edits, mostly s/adj_x/adjoint_a/ s/adj_y/adjoint_b/.
Change: 139821372
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BatchCholesky
BatchCholeskyGrad
BatchMatrixDeterminant
BatchMatrixInverse
BatchMatrixSolve
BatchMatrixSolveLs
BatchMatrixTriangularSolve
BatchSelfAdjointEig
BatchSelfAdjointEigV2
BatchSvd
At this point, the non-batch versions were identical to the batch versions except for a check during shape inference restricting the matrix inputs to be rank 2.
NOTICE: This is a non-backwards compatible API change.
Change: 132692980
|
|
|
|
| |
Change: 123900456
|
|
|
|
|
|
|
| |
inverse of the adjoint of a matrix. This is convenient in various gradient computations where it also saves an explicit transpose op.
Implements gradients for batch_matrix_determinant and optimizes implementation of gradient for matrix_determinant.
Change: 119990836
|
|
|
|
| |
Change: 114374558
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changes:
* error message that refers to removed `DefaultSession` method.
* -Wnull-conversion warnings
* the "_start_time" attr for recvs when the flag "--brain_enable_scheduling_for_recvs" is set.
* typo in tutorial data download progress message.
* a typo ("however their installing"=>"however installing").
* typo, rename "TensorFlow Mechanics" to "How To" to be consistent with the website.
* a typo ("subtact"=>"subtract").
* protobuf examples in comments in tensorflow::Example.proto.
* formula formatting in MNIST beginner tutorial
* negative fraction-of-queue-full stats
* protobuf inclusion path so that Android demo will build under Blaze.
* small typo (moderatly > moderately)
* Session.run() to check that tensor arguments come from the session's graph.
* another six import
* seq2seq typo in bazel command
Base CL: 108349164
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
a few other changes.
Changes:
- Some improvements to convolution by using 32-bit indices by
@benoitsteiner. Not all calls converted yet. Also some
improvements to pooling as well by @benoitsteiner.
- Improvements to sparse matmul CPU implementation by Ashish
- Some fixes to warnings by @vrv
- Doc fixes to padding by @Yangqing
- Some improvements to Tensor wrappers by Eider
- Speed up of matrix inverse on CPU by Rasmus
- Add an example of doing image inference from a pre-trained model
by @petewarden.
- fixed formula in mnist example by nodir
- Updates to event accumulator by Cassandra
- Slight changes to tensor c api by @mrry
- Handling of strings in listdiff by Phil
- Fix negative fraction-of-queue-full stats by Frank
- Type-checking improvement to importer by Yaroslav
- logdir recursive search for Tensorboard by @danmane
- Session.run() checks for empty graph by Manoj
Base CL: 108013706
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changes:
- futurize --stage2 changes for Python 3 compatibility by @girving.
- Small updates to documentation by @vrv, schuster and others
- Account for failure of std::thread::hardware_concurrency by @ebrevdo.
- More changes for backwards-compatibility tests by Josh
- Updates to python op doc generation by Josh
- Added support for using the best-fit allocator via ConfigProto by @vrv.
- Rename LocalSession to DirectSession, since local was a bad name for
it.
- Enable tf.nn.moments() to work with tensors of unknown shape by @mrry.
GITHUB_ISSUE: 139
- Changes for Android build by Andrew.
Base CL: 107645181
|
|
TensorFlow is an open source software library for numerical computation
using data flow graphs.
Base CL: 107276108
|