aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/core/grappler/costs/virtual_scheduler.cc
Commit message (Collapse)AuthorAge
* Add support for modeling fast memory close to the processor/gpuGravatar A. Unique TensorFlower2018-10-09
| | | | PiperOrigin-RevId: 216453979
* Refactor CalculateOutputSize() from VirtualScheduler protected member ↵Gravatar Peter Ma2018-10-08
| | | | | | function to utils; Refactor EstimateSize() from memory_optimizer.cc to utils; some small changes for readability improvement PiperOrigin-RevId: 216307257
* Sorted the per-device summary printout with device names to improve readability.Gravatar A. Unique TensorFlower2018-08-23
| | | | PiperOrigin-RevId: 210007888
* Add two counters in Costs Struct for number of ops processed/predicted in ↵Gravatar Peter Ma2018-08-10
| | | | | | total, and number of ops predicted with unknown shapes PiperOrigin-RevId: 208274158
* Minor code cleanup in virtual_scheduler.Gravatar Max Galkin2018-08-03
| | | | PiperOrigin-RevId: 207334214
* Explicitly cast the types of a few variables in VLOG statements to avoid an ↵Gravatar A. Unique TensorFlower2018-08-02
| | | | | | issue where the compiler isn't sure of the type when building for arm64 computers. PiperOrigin-RevId: 207151595
* Tabularized the VLOGs printing per-op execution times to make it easier to ↵Gravatar A. Unique TensorFlower2018-07-24
| | | | | | see what ops take much time. PiperOrigin-RevId: 205913222
* show breakdown of execution cost with compute and memory cost for op ↵Gravatar A. Unique TensorFlower2018-04-24
| | | | | | summarization PiperOrigin-RevId: 194117030
* show breakdown of total execution time with compute and memory timeGravatar A. Unique TensorFlower2018-03-30
| | | | PiperOrigin-RevId: 191119550
* Relax one of the error conditions to allow modeling graphs without explicit ↵Gravatar Max Galkin2018-02-22
| | | | | | set of feed nodes. PiperOrigin-RevId: 186661729
* In VirtualScheduler, if there is a Recv without a Send, handle the Recv as anGravatar A. Unique TensorFlower2018-02-21
| | | | | | initially ready node. PiperOrigin-RevId: 186509851
* Enable the use of scheduling heuristics to reduce peak memory usage by defaultGravatar Benoit Steiner2018-02-12
| | | | PiperOrigin-RevId: 185413855
* Return an error instead of assertion when processing an ill-formed graph or anGravatar Benoit Steiner2018-02-01
| | | | | | invalid set of fetch nodes PiperOrigin-RevId: 184192790
* Combine host and device memory proto fields.Gravatar Yuefeng Zhou2018-01-17
| | | | PiperOrigin-RevId: 182284426
* Add _HostSend and _HostRecv to grapplerGravatar A. Unique TensorFlower2018-01-12
| | | | PiperOrigin-RevId: 181774069
* Implement priority logic in CompositeNodeMangager, in case there're multipleGravatar A. Unique TensorFlower2018-01-02
| | | | | | | | candidate nodes with the same time_ready. The priority order is _Send, _Recv, and then the other ops. This will also improve run-to-run consistency in simulated performance. PiperOrigin-RevId: 180592922
* Make the analytical estimator and the virtual cluster own the ↵Gravatar A. Unique TensorFlower2017-12-22
| | | | | | ready_node_manager to be used by the virtual_scheduler. This allows us to customize the ready_node_manager for each analyzer instance. PiperOrigin-RevId: 179967778
* Updated the virtual cluster to return the proper error code if the simulatedGravatar Benoit Steiner2017-12-22
| | | | | | peak memory usage exceeds the available memory. PiperOrigin-RevId: 179952918
* In FirstReady node manager, use node name as tie-breaker when multiple nodesGravatar A. Unique TensorFlower2017-12-22
| | | | | | have same time_ready. PiperOrigin-RevId: 179940344
* Fix typo in struct name: RecvNodeDescritorHash -> RecvNodeDescriptorHash.Gravatar A. Unique TensorFlower2017-12-20
| | | | PiperOrigin-RevId: 179756840
* Add CompositeNodeManager for Grappler VirtualScheduler.Gravatar A. Unique TensorFlower2017-12-12
| | | | | | | | | | | | | | | | CompositeNodeManager has per-device LIFO manager, FirstReadyManagers for _Send and _Recv ops, and chooses FirstReady among the ops from per-device LIFOManager and _Send and _Recv FirstReadyManagers. This one can maximizes producer-consumer locality within a device (with LIFO), but does not introduce previously reported scheduling inefficiency w.r.t. multi-device execution with separately managing _Send and _Recv ops and global FirstReady policy across devices. It's implemented, but not enabled; VirtualScheduler still uses FirstReadyManager. PiperOrigin-RevId: 178787352
* Prefix inaccurate costs with "~" in VirtualScheduler verbose log.Gravatar Max Galkin2017-12-01
| | | | | | | | | | Fix some inaccurate estimates exposed by this approach: - propagate the inaccuracy flag when merging device stats; - estimate Const as no-op; - estimate RandomUniform, Relu and Softmax as element-wise; - consider estimates accurate for known element-wise ops in op_level_cost_estimator. PiperOrigin-RevId: 177643976
* Added an option to assume that the shape of fed nodes in unknown since any ↵Gravatar Benoit Steiner2017-11-28
| | | | | | shape can be actually used. PiperOrigin-RevId: 177203023
* Small code cleanup.Gravatar Benoit Steiner2017-11-27
| | | | PiperOrigin-RevId: 177089408
* Minor change in VirtualScheduler logging: there's sometimes a difference ↵Gravatar Max Galkin2017-11-15
| | | | | | between device total uptime and the sum of per-op computation time, because uptime includes waiting for channel communications. PiperOrigin-RevId: 175912780
* Skip generating input / output properties for _Send and _Recv ops if those opsGravatar A. Unique TensorFlower2017-11-10
| | | | | | are not created from VirtualScheduler. PiperOrigin-RevId: 175314193
* Updated the virtual scheduler to use legal names when inserting Send/Recv ↵Gravatar Benoit Steiner2017-10-12
| | | | | | nodes in the graph. PiperOrigin-RevId: 171986401
* Scheduler exports tensor size info to RunMetadata. In addition, tensor size ↵Gravatar A. Unique TensorFlower2017-10-09
| | | | | | | | histogram is printed out optionally (use vmodule=analytical_cost_estimator=1 or 2). PiperOrigin-RevId: 171619454
* PiperOrigin-RevId: 170752644Gravatar A. Unique TensorFlower2017-10-02
|
* Define OpContext and use it for OpLevelCostEstimator.Gravatar A. Unique TensorFlower2017-09-26
| | | | | | | | This CL does not add any functionality (except GraphDef's function library pointer is passed to OpContext), but we can later add additional fields to OpContext struct for extending VirtualCluster, Scheduler, Placer, and others. PiperOrigin-RevId: 170157235
* FirstReadyManager for scheduling nodes in VirtualScheduler.Gravatar A. Unique TensorFlower2017-09-12
| | | | | | | | | | | | The current FIFOManager may yield inefficient scheduling; _Recv pushed to the FIFO blocks other nodes that can run before _Recv due to the node order in FIFO. FirstReadyManager picks a node with the earliest time_ready in the queue, avoiding this problem. Also, fixed VirtualPlacer to properly set device when Node's device name does not include job name and to set GPU:0 as default device. PiperOrigin-RevId: 168418455
* Relax the feed_nodes collection check, which triggers a false positive in ↵Gravatar A. Unique TensorFlower2017-09-12
| | | | | | some modes where the feed node collection is auto-generated. Keep it as a warning to help correct user-provided feed node lists. PiperOrigin-RevId: 168396408
* Change device string in RecvNodeDescriptor in VirtualScheduler from constGravatar A. Unique TensorFlower2017-08-18
| | | | | | | reference to const as the RecvNodeDescriptor (and cached_recv_nodes map) outlives device string from the NodeDef. PiperOrigin-RevId: 165748244
* Extend Grappler API to accept feed and fetch node lists.Gravatar A. Unique TensorFlower2017-08-17
| | | | PiperOrigin-RevId: 165630567
* Fix _Recv op caching for multi-output port ops in VirtualScheduler.Gravatar A. Unique TensorFlower2017-07-24
| | | | PiperOrigin-RevId: 162970793
* Automated g4 rollback of changelist 162782896Gravatar Anna R2017-07-21
| | | | PiperOrigin-RevId: 162797577
* Properly schedule merge nodes.Gravatar Benoit Steiner2017-07-21
| | | | PiperOrigin-RevId: 162792987
* Fix _Recv op caching for multi-output port ops in VirtualScheduler.Gravatar A. Unique TensorFlower2017-07-21
| | | | PiperOrigin-RevId: 162782896
* Merge changes from github.Gravatar Shanqing Cai2017-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | END_PUBLIC --- Commit d0f53f77f authored by Penghao Cen<scorpiocph@gmail.com> Committed by Shanqing Cai<cais@google.com>: Minor fix typo (#11323) --- Commit 02fcf564e authored by Chris Song<sjhshy@gmail.com> Committed by Chris Song<sjhshy@gmail.com>: Fix misspells. --- Commit 764c9b6b4 authored by Louis Tiao<ltiao@users.noreply.github.com> Committed by GitHub<noreply@github.com>: Fixed typo in docstring --- Commit f8cd1283e authored by Shanqing Cai<cais@google.com> Committed by Shanqing Cai<cais@google.com>: Chaser --- Commit 01383b946 authored by Shanqing Cai<cais@google.com> Committed by Shanqing Cai<cais@google.com>: Adapt TensorFlowTestCase.setUp() to new reset_default_graph() semantics Avoid calling reset_default_graph() directly to prevent exceptions in cases where test methods error out from within nested graph contexts, which can leave _default_graph_stack non-empty in certain Python versions. --- Commit 0ffc37890 authored by Amit Patankar<amitpatankar@google.com> Committed by Amit Patankar<amitpatankar@google.com>: Removing second declaration of functions. --- Commit f9c9cacb0 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Refactor ElementalIrEmitter's slice index finding code into IrArray::Index::SourceIndexOfSlice(). PiperOrigin-RevId: 161140653 --- Commit ba297aec9 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update ops-related pbtxt files. PiperOrigin-RevId: 161138258 --- Commit 68d666737 authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fixes a reentrant lock issue with tensors using ndarray memory which uses tensor memory. PiperOrigin-RevId: 161137788 --- Commit a2ee8bca3 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add support for int8 x int8 -> int32 matrix multiplication via cublasGemmEx to stream_executor. PiperOrigin-RevId: 161137741 --- Commit 755fa7b50 authored by Mark Daoust<markdaoust@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Block generate_test, and docs generating from running in python3. - Doc generation is currently unsupported in python3 - These both end in errors in python 3.5.1+ PiperOrigin-RevId: 161137467 --- Commit 97cbcac45 authored by Peter Hawkins<phawkins@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [TF:XLA] Fix failure in functionalize_control_flow rewrite for Enter nodes that are unused. Make sure we ignore such nodes without producing an error. PiperOrigin-RevId: 161136545 --- Commit dabcb60bc authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Add reasonable error messages to Builder::Build for bad parameter numbers. PiperOrigin-RevId: 161136262 --- Commit 0cbd249e8 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add complex tensors support to `matrix_determinant`. PiperOrigin-RevId: 161132422 --- Commit 335f1f14d authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Extend static shape inference for SparseTensors with dense_shapes constructed using slicing. PiperOrigin-RevId: 161132391 --- Commit 53604916e authored by Jianwei Xie<xiejw@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fixed the missing labels test in TPUEstimator. PiperOrigin-RevId: 161131282 --- Commit 9f57dc8dd authored by Bruno Rosa<bruno.rosa@eldorado.org.br> Committed by Bruno Rosa<bruno.rosa@eldorado.org.br>: Use mcpu instead of march for ppc64le march is not support by gcc on ppc64le --- Commit 7d5c74a9c authored by Skye Wanderman-Milne<skyewm@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Move duplicate detection logic from Graph to FunctionLibraryDefinition Turns out this is more useful, since there are many function libraries that don't belong to a graph. This will be used in a future change. Note that this maintains the current behavior of Graph. In addition, updates FunctionDefsEqual() to handle unset attr entries (I ran into this when using this in said future change). PiperOrigin-RevId: 161126628 --- Commit 2caec3af1 authored by Shanqing Cai<cais@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Disable more timeseries py tests failing in OSS PIP GPU builds PiperOrigin-RevId: 161124799 --- Commit 0b5cce367 authored by Eugene Brevdo<ebrevdo@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Get TopK op working on GPU again. Extend using cub's radix sort. 1. Undo rollback of Andreas Kirsch's initial implementation. 2. Use cub segmented radix sort if Andreas' heap-based impl for large k and small num_cols (thresholds of k=100, n=1000 determined empirically). 3. Use cub segmented radix sort if k == num_cols (this case is always faster). 4. Added benchmarks. Benchmarks show that the GPU implementation is up to 3x slower for small k but can be 10x faster for large num_cols and k. Benchmarks: Benchmark: m_128_n_10_k_5_use_gpu_False wall_time: 0.000166 s Throughput: 0.0077 GB/s Benchmark: m_128_n_10_k_5_use_gpu_True wall_time: 0.000796 s Throughput: 0.00161 GB/s Benchmark: m_128_n_10_k_9_use_gpu_False wall_time: 0.00017 s Throughput: 0.00751 GB/s Benchmark: m_128_n_10_k_9_use_gpu_True wall_time: 0.000796 s Throughput: 0.00161 GB/s Benchmark: m_128_n_10_k_10_use_gpu_False wall_time: 0.00017 s Throughput: 0.00753 GB/s Benchmark: m_128_n_10_k_10_use_gpu_True wall_time: 0.000775 s Throughput: 0.00165 GB/s Benchmark: m_128_n_100_k_1_use_gpu_False wall_time: 0.000155 s Throughput: 0.0826 GB/s Benchmark: m_128_n_100_k_1_use_gpu_True wall_time: 0.000796 s Throughput: 0.0161 GB/s Benchmark: m_128_n_100_k_50_use_gpu_False wall_time: 0.000247 s Throughput: 0.0519 GB/s Benchmark: m_128_n_100_k_50_use_gpu_True wall_time: 0.0008 s Throughput: 0.016 GB/s Benchmark: m_128_n_100_k_99_use_gpu_False wall_time: 0.000261 s Throughput: 0.049 GB/s Benchmark: m_128_n_100_k_99_use_gpu_True wall_time: 0.000794 s Throughput: 0.0161 GB/s Benchmark: m_128_n_100_k_100_use_gpu_False wall_time: 0.000239 s Throughput: 0.0536 GB/s Benchmark: m_128_n_100_k_100_use_gpu_True wall_time: 0.000777 s Throughput: 0.0165 GB/s Benchmark: m_128_n_1000_k_1_use_gpu_False wall_time: 0.000324 s Throughput: 0.395 GB/s Benchmark: m_128_n_1000_k_1_use_gpu_True wall_time: 0.000916 s Throughput: 0.14 GB/s Benchmark: m_128_n_1000_k_10_use_gpu_False wall_time: 0.00042 s Throughput: 0.305 GB/s Benchmark: m_128_n_1000_k_10_use_gpu_True wall_time: 0.000902 s Throughput: 0.142 GB/s Benchmark: m_128_n_1000_k_500_use_gpu_False wall_time: 0.0011 s Throughput: 0.116 GB/s Benchmark: m_128_n_1000_k_500_use_gpu_True wall_time: 0.00097 s Throughput: 0.132 GB/s Benchmark: m_128_n_1000_k_990_use_gpu_False wall_time: 0.00133 s Throughput: 0.0962 GB/s Benchmark: m_128_n_1000_k_990_use_gpu_True wall_time: 0.000993 s Throughput: 0.129 GB/s Benchmark: m_128_n_1000_k_1000_use_gpu_False wall_time: 0.00102 s Throughput: 0.126 GB/s Benchmark: m_128_n_1000_k_1000_use_gpu_True wall_time: 0.000964 s Throughput: 0.133 GB/s Benchmark: m_128_n_10000_k_10_use_gpu_False wall_time: 0.002 s Throughput: 0.64 GB/s Benchmark: m_128_n_10000_k_10_use_gpu_True wall_time: 0.00288 s Throughput: 0.445 GB/s Benchmark: m_128_n_10000_k_100_use_gpu_False wall_time: 0.00233 s Throughput: 0.549 GB/s Benchmark: m_128_n_10000_k_100_use_gpu_True wall_time: 0.00325 s Throughput: 0.394 GB/s Benchmark: m_128_n_10000_k_5000_use_gpu_False wall_time: 0.0127 s Throughput: 0.101 GB/s Benchmark: m_128_n_10000_k_5000_use_gpu_True wall_time: 0.00381 s Throughput: 0.336 GB/s Benchmark: m_128_n_10000_k_9900_use_gpu_False wall_time: 0.015 s Throughput: 0.0853 GB/s Benchmark: m_128_n_10000_k_9900_use_gpu_True wall_time: 0.00438 s Throughput: 0.292 GB/s Benchmark: m_128_n_10000_k_10000_use_gpu_False wall_time: 0.0104 s Throughput: 0.123 GB/s Benchmark: m_128_n_10000_k_10000_use_gpu_True wall_time: 0.00427 s Throughput: 0.3 GB/s Benchmark: m_128_n_100000_k_100_use_gpu_False wall_time: 0.0148 s Throughput: 0.865 GB/s Benchmark: m_128_n_100000_k_100_use_gpu_True wall_time: 0.0262 s Throughput: 0.488 GB/s Benchmark: m_128_n_100000_k_1000_use_gpu_False wall_time: 0.0201 s Throughput: 0.636 GB/s Benchmark: m_128_n_100000_k_1000_use_gpu_True wall_time: 0.0263 s Throughput: 0.486 GB/s Benchmark: m_128_n_100000_k_50000_use_gpu_False wall_time: 0.214 s Throughput: 0.0599 GB/s Benchmark: m_128_n_100000_k_50000_use_gpu_True wall_time: 0.0322 s Throughput: 0.398 GB/s Benchmark: m_128_n_100000_k_99000_use_gpu_False wall_time: 0.262 s Throughput: 0.0489 GB/s Benchmark: m_128_n_100000_k_99000_use_gpu_True wall_time: 0.0377 s Throughput: 0.34 GB/s Benchmark: m_128_n_100000_k_100000_use_gpu_False wall_time: 0.118 s Throughput: 0.108 GB/s Benchmark: m_128_n_100000_k_100000_use_gpu_True wall_time: 0.0365 s Throughput: 0.351 GB/s END_PUBLIC BEGIN_PUBLIC BEGIN_PUBLIC Automated g4 rollback of changelist 157169178 PiperOrigin-RevId: 161476569
* Prepare to remove a bunch of proto.h includes from tensorflow/core headersGravatar Geoffrey Irving2017-06-29
| | | | | | | | | | | | The goal is to make kernels mostly independent of proto headers, which will let us lock down our .so imports. This CL does not remove any actual headers, but changes a bunch of files so that header removal is possible in a followup CL. It also marks the headers that will be removed with // TODO(b/62899350): Remove RELNOTES: n/a PiperOrigin-RevId: 160552878
* Add tensor sizes to Grappler's VirtualScheduler's RunMetadata output.Gravatar A. Unique TensorFlower2017-06-28
| | | | PiperOrigin-RevId: 160369034
* Add item's graph to partition_graphs in virtual cluster's run method.Gravatar A. Unique TensorFlower2017-06-23
| | | | | | Put node op name in timeline_label instead of node_name. PiperOrigin-RevId: 159986583
* Change default manager to FIFO for general performance improvements.Gravatar A. Unique TensorFlower2017-06-20
| | | | PiperOrigin-RevId: 159626166
* Added LIFOManager to use LIFO approach of returning the next ready node.Gravatar A. Unique TensorFlower2017-06-19
| | | | PiperOrigin-RevId: 159498617
* Make virtual scheduler potentially write summary information to stepstats_ ↵Gravatar A. Unique TensorFlower2017-06-14
| | | | | | variable. PiperOrigin-RevId: 159039540
* Record the maximum memory usage simulated by the analytical_cost_estimatorGravatar Benoit Steiner2017-06-13
| | | | PiperOrigin-RevId: 158875735
* Make device placement be determined only by virtual placer. Make virtual ↵Gravatar A. Unique TensorFlower2017-06-10
| | | | | | placer private to virtual scheduler. Remove device handling from graph properties. Remove hard-coded default device type from analytical_cost_estimator / virtual_scheduler. PiperOrigin-RevId: 158625478
* Delete unnecessary (mistakenly duplicated) logging message.Gravatar A. Unique TensorFlower2017-06-08
| | | | PiperOrigin-RevId: 158428506
* Add missing header inclusionGravatar A. Unique TensorFlower2017-06-07
| | | | PiperOrigin-RevId: 158265934
* Profile memory usage in VirtualScheduler and report peak memory usage.Gravatar A. Unique TensorFlower2017-06-06
| | | | | | | | | To do so, NodeState now handles different output ports of a node (in case a node has multiple outputs). Also, VirtualScheduler code is cleaned up with more comments. PiperOrigin-RevId: 158209068