| Commit message (Collapse) | Author | Age |
... | |
| |/ / /
| | | |
| | | |
| | | | |
Change: 126119692
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Minor comments updates
Change: 126112363
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
left nav and index page (and fixes an URL in the wide overview). Also breaks the left nav and index page into sections, because the list of tutorials is getting long.
Change: 126105974
|
|\ \ \ \
| | | | |
| | | | | |
Branch 126082003
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
CUDA is linked in.
Also fix a typo in tf.test
Change: 126104437
|
| | | | |
| | | | |
| | | | |
| | | | | |
Change: 126101067
|
| | | | |
| | | | |
| | | | |
| | | | | |
Change: 126101038
|
| | | | |
| | | | |
| | | | |
| | | | | |
Change: 126099328
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
some noHash logic for Selenium tests into URIStorage module, since this logic should be common to all components that store state in the URI hash.
Change: 126098513
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
crosstool_wrapper_driver_is_not_gcc (#3077)
Many superclusters need to compile TensorFlow from source due to an outdated glibc version (see #110). In @rdipietro's excellent workaround post (https://github.com/tensorflow/tensorflow/issues/110#issuecomment-219790730) he mentions issues with the referenced Python version in this file. I have issues as well, but of a different nature. In my case the build script is unable to find `libpython2.7.so.1.0`, since only Python 3 is present on my machine. The issue originates from `crosstool_wrapper_driver_is_not_gcc` where the only Python 2.7 exclusive feature is the `print` statement. By `import`ing `print_function from __future__` the explicit dependency can be dropped and both versions of Python are supported.
|
| | | | |
| | | | |
| | | | |
| | | | | |
Change: 126091656
|
| | | | |
| | | | |
| | | | |
| | | | | |
Change: 126091563
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
since some device stats can skew results.
Change: 126089107
|
| | | | |
| | | | |
| | | | |
| | | | | |
Change: 126087206
|
| | | | |
| | | | |
| | | | |
| | | | | |
Change: 126086117
|
|/| | | |
| |/ / / |
|
| | | |
| | | |
| | | |
| | | | |
Change: 126082003
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
fill_functor.h to its own compilation unit in fill_functor.cc.
Change: 126081539
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Graphs.
Change: 126081162
|
| | | |
| | | |
| | | |
| | | | |
Change: 126079889
|
| | | |
| | | |
| | | |
| | | | |
Change: 126070272
|
| | | | |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
* Fixed makefile problems with new files and zlib dependency
* Added basic Raspberry Pi example, and fixed math error in iOS examples
* Added initial camera example for Raspberry Pi
* Added model running to camera example
* Added Xcode version check, and logging utilities for kernel errors
|
| | | |
| | | |
| | | |
| | | | |
Change: 126039263
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Test distributed training of a wide&deep (census) model on a local k8s cluster.
With local_test.sh, the "--model-name CENSUS_WIDENDEEP" flag can be used to run this test, e.g.,
local_test.sh --model-name CENSUS_WIDENDEEP
The k8s tf workers launched from within the docker-in-docker container will have shared storage at "/shared", which is mounted using k8s hostPath.
The syntax to run the existing MNIST distributed test is not affected.
Change: 126030379
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Clean up session_bundle/exporter to be importable.
Change: 126029457
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
* Add conv3d_transpose operation and register gradients for Conv3DBackpropInputV2 and Conv3DBackpropFilterV2.
* Fix indentation.
* Add a comment that says transpose of convolution is sometimes referred to as "deconvolution."
* Update formatting.
|
| | |\ \
| |_|/ /
|/| | | |
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Useful to identify execution stages when looking at logs.
Change: 126022244
|
| | | |
| | | |
| | | |
| | | | |
Change: 126013328
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
* Implement bidirectional_dynamic_rnn (#1779)
* Modify return values of bidirectional_dynamic_rnn
Instead of the concatenated `Tensor` of the forward and backward
outputs, it now returns them separately as a tuple.
Additionally, the forward and backward states are now returned
as a tuple.
* Enhance readability of bidirectional_dynamic_rnn
* Modified an ambiguous docstring in bidirectional_dynamic_rnn
* Fix typo in bidirectional_dynamic_rnn
|
| | | |
| | | |
| | | |
| | | | |
Change: 126010736
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This makes it easier to set properties such as the
`gpu_options.per_process_gpu_memory_fraction`, which have to be set on
the server, rather than individual serssions.
Fixes #3057.
Change: 126009942
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
handle per-thread buffer allocation for the tileable executor without resorting to thread_local that is not fully supported on Android.
Change: 126009029
|
|\ \ \ \
| | | | |
| | | | | |
Fix Nightly Python3 error in graph_io_test.py
|
| | | | |
| | | | |
| | | | |
| | | | | |
Change: 125997620
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Fixes #3035.
Change: 125992640
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
done by models distributed across many devices. A small
microbenchmark model that runs two banks (A and B) of 30 nodes with a
30x30 full shuffle between them, where each of the nodes in A and in B
run with one node on each of the 30 devices (so 30*29+30+30, or ~930
separate RPCs) was showing ~111,000 allocations per iteration of the graph.
With the changes here, this is now down to ~64,300 allocations per iteration.
Changes include:
o DeviceContext::CopyDeviceTensorToCPU and related helper routines:
use StringPiece instead of const string& for the tensor name (avoids
creating a string in some cases where the caller only has a
StringPiece available).
o Change some Rendezvous and BaseRemoteRendezvous interfaces to
take a 'const Rendezvous::ParsedKey& key', rather than 'const string& key'.
In many cases, the callers were already having to parse the key
into a ParsedKey, and so we were doing the parsing multiple times at
different levels as we processed receiving or sending of a tensor. This
reduces the number of times that we parse a key as it flows from a Send
node through to a Recv node on another worker.
o Changed Rendezvous::ParsedKey so that it makes a copy of the underlying
full key, and then uses StringPiece objects to point into this copy for
the src_device, dst_device, and edge_name pieces. This turns 3 string
allocations into 1 per Rendezvous::ParseKey call.
o Added new StringPiece Rendezvous::ParsedKey::FullKey() accessor to
return a StringPiece for the underlying full key, and used that in a
few places (mostly logging) where that is useful.
o In many places, used std::move(function_variable) when assigning to
an instance variable. This eliminates a very large number of excess
std::function allocations/initializations (~56000 of the baseline
allocations were related to std::function setup or cloning, and this
is now down to ~11000 after this cl).
o In the RPC-based remote workers (StubbyRemoteWorker and
GrpcRemoteWorker), changed the code path in RecvTensorAsync to avoid
creation of a std::function with 6 arguments unless necessary. There
are three cases now handled separately:
(a) We're not logging, and we didn't make a copy of the request that we
need to free: just use the passed in 'StatusCallback done' object
directly, without creating a wrapper std::function object at all
(b) We're not logging, but we made a copy of the request that we
need to free: we create a simple wrapper std::function that
invokes the passed in 'done' callback, and then frees the
req_copy request copy object.
(c) We're logging: we create the std::function object with all the
necessary state to log when the recv has finished.
o Changed DeviceMgr::LookupDevice to take a StringPiece, rather than a
const string&, and changed the hash table to use StringPiece keys.
This allows clients that just have a StringPiece device name in their
hand to avoid a string creation to lookup the Device* object.
o Changed ExecutorState to use a specialized TaggedNodeReadyQueue that
internally uses a gtl::InlinedVector<TaggedNode, 16>, rather than
using a std::deque<TaggedNode> for keeping track of nodes ready to
execute. This is faster because it avoids allocations entirely if the
ready node queue doesn't get bigger than 16, and inlined vectors are
generally faster than std::deque, at a minor risk of using more memory
if this queue grows to very large numbers of ready nodes (mostly imaginable
only in pathological graphs).
o In ExecutorState::Process, allocated a single ExecutorState::AsyncState
object to keep track of all the state we need to preserve for an asynchronously
executed node, rather than keeping this state implicitly via a very large
number of arguments to a lamda function.
o Added new atomic std::atomic<bool> status_is_ok_ in
BaseRemoteRendezvous. This allows us to avoid acquiring the lock when
we just want to check if the status is non-OK in
BaseRemoteRendezvous::Send and BaseRemoteRendezvous::ValidateDevices.
o In GraphMgr::RunAllDone, changed assignment of args.runner to avoid
one extra level of std::function indirection (binding the function directly
to the ThreadPool::Schedule routine, rather than creating an intermediate
lambda function that invokes this inside the body of the lambda.
o Added freelist of RpcRecvTensorCall objects in
third_party/tensorflow/core/distributed_runtime/rpc/rpc_rendezvous_mgr.cc
o Changed third_party/tensorflow/core/framework/rendezvous.cc to keep the
hashtable of Item* objects keyed by uint64 (hash of the tensor name), rather
than the full-string tensor name. Collisions in the 64-bit hash space
should basically never happen.
o Sped up DeviceNameUtils::ParseFullName by optimizing for the common
ordering of parts of /job, /replica, /task, /device. The parsing code
was general enough to handle any order, but did so by comparing the
prefixes 4, 3, 2, and 1 times, respectively, rather than 1, 1, 1, and 1 times.
o Sped up DeviceNameUtils::SplitDeviceName to avoid extra string copies.
Change: 125991891
|
|/ / / /
| | | |
| | | |
| | | | |
Change: 125988941
|
| | | |
| | | |
| | | |
| | | | |
Change: 125988941
|
| | | |
| | | |
| | | |
| | | | |
Change: 125985670
|
| | | |
| | | |
| | | |
| | | | |
Change: 125983822
|
| | | |
| | | |
| | | |
| | | | |
Change: 125979359
|
|\ \ \ \
| | | | |
| | | | | |
fix word2vec_test's tmp path
|
| | | | |
| | | | |
| | | | |
| | | | | |
Change: 125975221
|
| | | | |
| | | | |
| | | | |
| | | | | |
Change: 125968495
|
| | | | |
| | | | |
| | | | |
| | | | | |
Change: 125967269
|
| | | | |
| | | | |
| | | | |
| | | | | |
Change: 125964943
|
| | | | |
| | | | |
| | | | |
| | | | | |
Change: 125964567
|
| | | | |
| | | | |
| | | | |
| | | | | |
Change: 125963660
|