| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
| |
SimplePlacer -> Placer
And clean up a couple unneeded headers.
PiperOrigin-RevId: 167955883
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TensorFlow graphs.
For a real-world large graph (13k nodes, 20k edges), this change:
* reduces all heap allocations by 19%
* reduces retained (final) heap allocations by 2.2%
* reduces CPU time by 11.2%
In most TF graphs, the set of unique values set to Node::assigned_device_name() is quite small. This change adds an interning table to the Graph object, which contains all of the unique values used for Node::set_assigned_device_name(), as well as a look-up table. This is the main source of the reduction in retained heap memory; nearly all nodes are assigned to just one or two unique devices.
This change removes the "string assigned_device_name_" field from the Node class, and replaces it with "int assigned_device_name_index_". However, because you need both the index and the name table to get the actual value, the Node::assigned_device_name() accessor needs access to the parent Graph. This requires adding a "Graph* graph_" field to the Node class.
In the future, if all users of this property are converted to use Graph::assigned_device_name(Node*), then the Node::graph_ field can be deleted, and the space reclaimed. However, doing so is out of the scope of this CL, and even with this new pointer field, the Node class is smaller than it was before, so this is still a net win.
The placement algorithm in simple_placer.cc is one of the main accessors of the Node::assigned_device_name property. This CL contains significant changes to simple_placer.cc, which directly take advantage of the fact that the property is an index into a name table, rather than treating it simply as a string. Many temporary allocations are also removed, which is the main source of the reduction in total heap allocations.
This CL also contains a few changes that remove short-lived allocations in unrelated code, such as the changes in op.cc/h, costmodel.cc, etc. It is extremely easy in C++ to accidentally allocate memory, especially when implicit conversions and copy constructors allocate memory.
All of the changes in this CL were motivated by empirical measurement, using CPU profiling and heap profiling.
PiperOrigin-RevId: 157762909
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(is in the list of passed in devices).
Before this change, if the candidate_device_name was /cpu:0 but
the list of valid devices was /cpu:1 (because that's what the user
specified), we would still apply the heuristics to assign the device
to candidate_device_name, since the device type is the same.
However, we never want to ignore what the user has specified, so
we properly check that the device name matches one of the devices
in the list of valid devices for that node (as determined by the
user or the placer constraints).
This adds tests to verify this behavior.
Also added a note about the check for assigned_device_name in one
of the loops. The check is not strictly necessary:
the act of adding a node to the colocation_group structure
at the beginning function reads existing assigned_device_name
and populates the list of possible devices, so GetDevicesByNode()
will always return a single device in this case. However, this
check avoids some extra computation that isn't needed, so it's still
valid to have. I now do add a AssignAndLog statement to make sure
stateful placements are logged (before this change, they weren't).
cc @DavidNorman
Change: 143978397
|
|
|
|
| |
Change: 134721831
|
|
|
|
| |
Change: 123900938
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(nodes with one non-ref output and one consumer), and places it
preferentially with its consumer.
For example:
assign
/ \
var input
In the above graph, assign is bound to the device of 'var' due to the
reference edge. This heuristic binds 'input' to the same device as
the assign, because it has only one consumer.
This addresses the general problem of colocating initializers with
their variables, and similar other cases. There are very few reasons
to want to place the 'input' on a node other than its consumer (there
are some contrived cases, but that's why this is a heuristic).
This CL adds a test case for this small example above, illustrative
of the general problem.
An extension of this CL would be to do the same thing not just for
single output / single consumer nodes, but whenever all out edges of
a node connect to the same 'colcoation group'.
Change: 123896863
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AssignDevice and into the main loop, for a future refactor where we
inject the choice of algorithm to use when selecting a device for a node.
Currently this continues to use just the first device in the list,
but we would like to be able to play around with algorithms that choose
alternative strategies, perhaps based on other heuristics and runtime
information. SimplePlacer remains the code that does performs the
precondition filters (hard-device assignment and validation), so that
other placement algorithms don't have to worry about enforcing the correct
assignments / conditions.
Change: 121339923
|
|
|
|
|
|
|
|
|
|
|
|
| |
by colocation_groups in _class attr), cleanup calls to pass in
name to id map, which is no longer needed in SimplePlacer. Should speed
up graph construction in C++ because we don't need to iterate over all of the nodes
once to build the map.
In the future, utilities should rely on node ids instead of node names
so the map is not necessary (ideally). Alternatively, a structure in
Graph should maintain the mapping.
Change: 121302549
|
|
|
|
|
| |
tensorflow/core/ files and build targets.
Change: 113078283
|
|
|
|
|
| |
directly so we can drop it from port.h.
Change: 111506630
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changes:
* error message that refers to removed `DefaultSession` method.
* -Wnull-conversion warnings
* the "_start_time" attr for recvs when the flag "--brain_enable_scheduling_for_recvs" is set.
* typo in tutorial data download progress message.
* a typo ("however their installing"=>"however installing").
* typo, rename "TensorFlow Mechanics" to "How To" to be consistent with the website.
* a typo ("subtact"=>"subtract").
* protobuf examples in comments in tensorflow::Example.proto.
* formula formatting in MNIST beginner tutorial
* negative fraction-of-queue-full stats
* protobuf inclusion path so that Android demo will build under Blaze.
* small typo (moderatly > moderately)
* Session.run() to check that tensor arguments come from the session's graph.
* another six import
* seq2seq typo in bazel command
Base CL: 108349164
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
error handling, updates to website.
Changes:
- Removes redundant reshape from image models by @mrry
- Default TensorBoard to localhost by @danmane
- Reformatting of tensorflow/core by @josh11b
- Make tutorials backwards compatible to 0.5.0 by @girving
- Improve print documentation (md files not updated).
- Add proper scrolling to sitemap by @martinwicke
Base CL: 107956254
|
|
TensorFlow is an open source software library for numerical computation
using data flow graphs.
Base CL: 107276108
|