aboutsummaryrefslogtreecommitdiffhomepage
path: root/tensorflow/core/profiler
diff options
context:
space:
mode:
authorGravatar A. Unique TensorFlower <gardener@tensorflow.org>2017-08-15 21:39:28 -0700
committerGravatar TensorFlower Gardener <gardener@tensorflow.org>2017-08-15 21:42:58 -0700
commit2b51e0ba27af69c914a7523d9aae232de09e3206 (patch)
tree892089b4eca69784760e397f9bf1a5dd64dc9f18 /tensorflow/core/profiler
parent8041185b16c0ce9348bd79900f6682fc3976bfd7 (diff)
1. Support profiling nodes in RunMetadata but not in GraphDef.
2. Add an API to allow easier profile retrieval. Currently in contrib. PiperOrigin-RevId: 165399640
Diffstat (limited to 'tensorflow/core/profiler')
-rw-r--r--tensorflow/core/profiler/README.md87
-rw-r--r--tensorflow/core/profiler/g3doc/options.md26
-rw-r--r--tensorflow/core/profiler/internal/tfprof_node.cc7
-rw-r--r--tensorflow/core/profiler/internal/tfprof_show_test.cc31
-rw-r--r--tensorflow/core/profiler/internal/tfprof_stats.cc27
-rw-r--r--tensorflow/core/profiler/internal/tfprof_stats_test.cc14
-rw-r--r--tensorflow/core/profiler/internal/tfprof_timeline_test.cc4
7 files changed, 132 insertions, 64 deletions
diff --git a/tensorflow/core/profiler/README.md b/tensorflow/core/profiler/README.md
index f755089fc7..40fb1f836e 100644
--- a/tensorflow/core/profiler/README.md
+++ b/tensorflow/core/profiler/README.md
@@ -1,12 +1,10 @@
<h1>TensorFlow Profiler and Advisor</h1>
* [Features](#features)
-* [Interfaces](#interfaces)
-* [Tutorials](#tutorials)
+* [Quick Start](#quick-start)
* [Demo](#demo)
* [Feature Request and Bug Report](#feature-request-and-bug-report)
-Contact for bug report and feature request (xpan@google.com)
### Features
@@ -20,43 +18,57 @@ Contact for bug report and feature request (xpan@google.com)
* operation configuration check
* distributed runtime check (Not OSS)
-### Interfaces
-
-* Python API
-* Command Line
-* Visualization
-* C++ API (Not public, contact us if needed.)
-
-tfprof provides 4 different views to organize the profiles.
-
- * code view: operations are grouped by Python codes that generate them.
- * op view: operations are grouped by operation type (E.g. MatMul, Conv2D).
- * scope view: operations are organized based on name scope hierarchies.
- * graph view: operations are organized based on input/output.
-
-tfprof provides options to help user select, filter and order statistics.
-See [Options](g3doc/options.md) for detail instructions.
+### Quick Start
+
+```python
+# Enable tracing and profile the time and memory information.
+builder = tf.profiler.ProfileOptionBuilder
+opts = builder(builder.time_and_memory()).order_by('micros').build()
+with tf.contrib.tfprof.ProfileContext() as pctx:
+ with tf.Session() as sess:
+ x = build_model()
+ sess.run(tf.global_variables_initializer())
+ for i in range(10000):
+ if i == 10:
+ pctx.capture_next_run_meta()
+ _ = sess.run(x)
+ tf.profiler.profile(sess.graph, run_meta=pctx.run_meta(), cmd='op', options=opts)
+ else:
+ _ = sess.run(x)
+```
+```python
+# When using high-level API, session is usually hidden.
+# Register profiling to ProfileContext that performs profiling automatically.
+builder = tf.profiler.ProfileOptionBuilder
+opts = builder(builder.time_and_memory()).order_by('micros').build()
+opts2 = tf.profiler.ProfileOptionBuilder.trainable_variables_parameter()
+with tf.contrib.tfprof.ProfileContext() as pctx:
+ # Profile time and memory from step 100 to 109.
+ # Note: High level APIs can run sessions for queue, variable initialization
+ # or training at different steps. Try profile multiple steps.
+ pctx.add_auto_profiling('op', opts, range(100, 110))
+ # Profile model parameters once at step 10 (since it is static).
+ pctx.add_auto_profiling('scope', opts2, [10])
+ # High level API, such as slim, Estimator, etc.
+ train_loop()
```
--max_depth 10
--min_bytes 0
--min_micros 0
--min_params 0
--min_float_ops 0
--min_occurrence 0
--step -1
--order_by name
--account_type_regexes .*
--start_name_regexes .*
--trim_name_regexes
--show_name_regexes .*
--hide_name_regexes
--account_displayed_op_only false
--select params
--output stdout:
+
+```python
+# Profiling from Python API is not interactive.
+# Dump the profiles to files and profile with interactive command line.
+with tf.contrib.tfprof.ProfileContext() as pctx:
+ pctx.add_auto_profile_dump('/tmp/profiles', [100])
+ train_loop()
+
+bazel-bin/tensorflow/core/profiler/profiler \
+ --graph_path=/tmp/profiles/graph.pbtxt \
+ --run_meta_path=/tmp/profiles/run_meta \
+ --op_log_path=/tmp/profiles/tfprof_log \
+tfprof> op -select micros,bytes,occurrence -order_by micros
```
-### Tutorials
+<b>Detail Tutorials</b>
* [Python API](g3doc/python_api.md)
* [Command Line Interface](g3doc/command_line.md)
@@ -64,6 +76,9 @@ See [Options](g3doc/options.md) for detail instructions.
* [Profile Memory](g3doc/profile_memory.md)
* [Profile Model Architecture](g3doc/profile_model_architecture.md)
* [Auto Detect and Advise](g3doc/advise.md)
+
+<b>Detail Documentation</b>
+
* [Options](g3doc/options.md)
## Demo
diff --git a/tensorflow/core/profiler/g3doc/options.md b/tensorflow/core/profiler/g3doc/options.md
index bdcc6b2bd8..15712d04c2 100644
--- a/tensorflow/core/profiler/g3doc/options.md
+++ b/tensorflow/core/profiler/g3doc/options.md
@@ -41,6 +41,32 @@ can vary. For example `-max_depth` in scope view means the depth of
name scope <b>tree</b>. In op view, it means the length of operation <b>list</b>.
In graph view, in means the number of hops in the <b>graph</b>.
+### Times
+
+Most machines have mutli-core CPUs. Some installs one or more accelerators.
+Each accelerator usually performs massive parallel processing. The profiler
+tracks the accumulated processing times. Hence, the accumulated processing
+time is likely larger than the time of each step.
+
+micros: This is the sum of cpu and accelerator times.
+accelerator_micros: This is the accelerator times.
+cpu_micros: This is the cpu times.
+
+### Memory
+
+Tensor memory are usually ref-counted. The memory is released when there is
+no more reference to it. It will be difficult to track the release of memory.
+Currently, profiler only tracks the allocation of memory. As a result, the
+accumulated memory request is uaually larger than the peak memory of the overall
+model.
+
+bytes: The memory allocations requested by the operation.
+peak_bytes: The peak requested memory (not de-allocated) by the operation.
+residual_bytes: The memory requested by the operation and not de-allocated
+ when Compute finishes.
+output_bytes: The memory output by the operation. It's not necessarily requested
+ by the current operation. For example, it can be a tensor
+ forwarded from input to output, with in-place mutation.
###Docs
diff --git a/tensorflow/core/profiler/internal/tfprof_node.cc b/tensorflow/core/profiler/internal/tfprof_node.cc
index d4a784ffaa..19257968e6 100644
--- a/tensorflow/core/profiler/internal/tfprof_node.cc
+++ b/tensorflow/core/profiler/internal/tfprof_node.cc
@@ -25,7 +25,8 @@ bool CountAsAcceleratorTime(const string& device) {
}
bool CountAsCPUTime(const string& device) {
- return RE2::FullMatch(device, ".*/(device:gpu|gpu|cpu|device:sycl):\\d+");
+ return RE2::FullMatch(device,
+ ".*/(device:gpu|gpu|device:cpu|cpu|device:sycl):\\d+");
}
bool IsCanonicalDevice(const string& device) { return CountAsCPUTime(device); }
@@ -148,8 +149,8 @@ void TFGraphNode::AddStepStat(int64 step, const string& device,
if (IsCanonicalDevice(dev)) {
if (!canonical_device_.empty()) {
if (canonical_device_ != dev) {
- fprintf(stderr, "Unexpected: graph node changed device: %s->%s.\n",
- canonical_device_.c_str(), dev.c_str());
+ // TODO(xpan): Some RunMetadata node appears at multiple devices.
+ // Need to address it.
return;
}
} else {
diff --git a/tensorflow/core/profiler/internal/tfprof_show_test.cc b/tensorflow/core/profiler/internal/tfprof_show_test.cc
index f2c8b662d0..24cf9afb17 100644
--- a/tensorflow/core/profiler/internal/tfprof_show_test.cc
+++ b/tensorflow/core/profiler/internal/tfprof_show_test.cc
@@ -110,13 +110,14 @@ TEST_F(TFProfShowTest, DumpAcceleratorAndCPUMicros) {
TF_CHECK_OK(ReadFileToString(Env::Default(), dump_file, &dump_str));
EXPECT_EQ(
"node name | accelerator execution time | cpu execution "
- "time\n_TFProfRoot (--/404us, --/4.50ms)\n Conv2D (226us/226us, "
- "4.07ms/4.07ms)\n Conv2D_1 (178us/178us, 419us/419us)\n DW2 (0us/0us, "
- "11us/11us)\n DW2/Assign (0us/0us, 0us/0us)\n DW2/Initializer "
- "(0us/0us, 0us/0us)\n DW2/Initializer/random_normal (0us/0us, "
- "0us/0us)\n DW2/Initializer/random_normal/RandomStandardNormal "
- "(0us/0us, 0us/0us)\n DW2/Initializer/random_normal/mean "
- "(0us/0us, 0us/0us)\n DW2/Initializer/random_normal/mul (0us/0us, "
+ "time\n_TFProfRoot (--/404us, --/4.54ms)\n Conv2D (226us/226us, "
+ "4.07ms/4.07ms)\n Conv2D_1 (178us/178us, 419us/419us)\n "
+ "_retval_Conv2D_1_0_0 (0us/0us, 41us/41us)\n DW2 (0us/0us, 11us/11us)\n "
+ " DW2/Assign (0us/0us, 0us/0us)\n DW2/Initializer (0us/0us, "
+ "0us/0us)\n DW2/Initializer/random_normal (0us/0us, 0us/0us)\n "
+ " DW2/Initializer/random_normal/RandomStandardNormal (0us/0us, "
+ "0us/0us)\n DW2/Initializer/random_normal/mean (0us/0us, "
+ "0us/0us)\n DW2/Initializer/random_normal/mul (0us/0us, "
"0us/0us)\n DW2/Initializer/random_normal/shape (0us/0us, "
"0us/0us)\n DW2/Initializer/random_normal/stddev (0us/0us, "
"0us/0us)\n DW2/read (0us/0us, 0us/0us)\n DW (0us/0us, 2us/2us)\n "
@@ -156,27 +157,27 @@ TEST_F(TFProfShowTest, DumpOpMode) {
EXPECT_EQ(
"nodename|requestedbytes|totalexecutiontime|acceleratorexecutiontime|"
"cpuexecutiontime|#parameters|#float_ops|opoccurrence(run|defined)|"
- "inputshapes\nVariableV20B(0.00%,0.00%),13us(100.00%,0.27%),0us(100.00%,"
+ "inputshapes\nVariableV20B(0.00%,0.00%),13us(100.00%,0.26%),0us(100.00%,"
"0.00%),13us(100.00%,0.29%),451params(100.00%,100.00%),0float_ops(100.00%"
",0.00%),2|3\n\ninput_type:\t(run*2|defined*3)\texec_time:13us\n\nAdd0B("
- "0.00%,0.00%),0us(99.73%,0.00%),0us(100.00%,0.00%),0us(99.71%,0.00%),"
+ "0.00%,0.00%),0us(99.74%,0.00%),0us(100.00%,0.00%),0us(99.71%,0.00%),"
"0params(0.00%,0.00%),0float_ops(100.00%,0.00%),0|3\n\ninput_type:0:1,"
"\t1:1\t(run*0|defined*1)\texec_time:0us\ninput_type:0:2x2x6x12,\t1:1\t("
"run*0|defined*1)\texec_time:0us\ninput_type:0:3x3x3x6,\t1:1\t(run*0|"
- "defined*1)\texec_time:0us\n\nAssign0B(0.00%,0.00%),0us(99.73%,0.00%),"
+ "defined*1)\texec_time:0us\n\nAssign0B(0.00%,0.00%),0us(99.74%,0.00%),"
"0us(100.00%,0.00%),0us(99.71%,0.00%),0params(0.00%,0.00%),0float_ops("
"100.00%,0.00%),0|3\n\ninput_type:0:1,\t1:1\t(run*0|defined*1)\texec_"
"time:0us\ninput_type:0:2x2x6x12,\t1:2x2x6x12\t(run*0|defined*1)\texec_"
"time:0us\ninput_type:0:3x3x3x6,\t1:3x3x3x6\t(run*0|defined*1)\texec_"
- "time:0us\n\nConst0B(0.00%,0.00%),2us(99.73%,0.04%),0us(100.00%,0.00%),"
+ "time:0us\n\nConst0B(0.00%,0.00%),2us(99.74%,0.04%),0us(100.00%,0.00%),"
"2us(99.71%,0.04%),0params(0.00%,0.00%),0float_ops(100.00%,0.00%),1|"
"10\n\ninput_type:\t(run*1|defined*10)\texec_time:2us\n\nConv2D14.59KB("
- "100.00%,100.00%),4.89ms(99.69%,99.69%),404us(100.00%,100.00%),4.49ms(99."
- "67%,99.67%),0params(0.00%,0.00%),10.44kfloat_ops(100.00%,100.00%),2|"
+ "100.00%,100.00%),4.89ms(99.70%,98.87%),404us(100.00%,100.00%),4.49ms(99."
+ "67%,98.77%),0params(0.00%,0.00%),10.44kfloat_ops(100.00%,100.00%),2|"
"2\n\ninput_type:0:2x3x3x6,\t1:2x2x6x12\t(run*1|defined*1)\texec_time:"
"597us\ninput_type:0:2x6x6x3,\t1:3x3x3x6\t(run*1|defined*1)\texec_time:4."
- "29ms\n\nIdentity0B(0.00%,0.00%),0us(0.00%,0.00%),0us(0.00%,0.00%),0us(0."
- "00%,0.00%),0params(0.00%,0.00%),0float_ops(0.00%,0.00%),0|3\n\ninput_"
+ "29ms\n\nIdentity0B(0.00%,0.00%),0us(0.83%,0.00%),0us(0.00%,0.00%),0us(0."
+ "90%,0.00%),0params(0.00%,0.00%),0float_ops(0.00%,0.00%),0|3\n\ninput_"
"type:0:1\t(run*0|defined*1)\texec_time:0us\ninput_type:0:2x2x6x12\t(run*"
"0|defined*1)\texec_time:0us\ninput_type:0:3x3x3x6\t(run*0|defined*1)"
"\texec_time:0us\n\n",
diff --git a/tensorflow/core/profiler/internal/tfprof_stats.cc b/tensorflow/core/profiler/internal/tfprof_stats.cc
index ea3e7b487c..012c49525a 100644
--- a/tensorflow/core/profiler/internal/tfprof_stats.cc
+++ b/tensorflow/core/profiler/internal/tfprof_stats.cc
@@ -25,6 +25,22 @@ limitations under the License.
namespace tensorflow {
namespace tfprof {
+namespace {
+bool CreateRunMetadataNode(const string& name, NodeDef* def) {
+ // TODO(xpan): Better solution than blacklisting this 2 nodes. They
+ // actually cost some resources, maybe include them. Some nodes, such
+ // as _SOURCE appear in multiple devices, which breaks tfprof's assumption.
+ if (name == "RecvTensor" || name == "_SOURCE" ||
+ name.find("MEMCPY") != name.npos) {
+ return false;
+ }
+ def->set_name(name);
+ // TODO(xpan): Better operation type.
+ def->set_op("RunTimeOp");
+ return true;
+}
+} // namespace
+
TFStats::TFStats(std::unique_ptr<GraphDef> graph,
std::unique_ptr<RunMetadata> run_meta,
std::unique_ptr<OpLogProto> op_log,
@@ -191,7 +207,16 @@ void TFStats::AddRunMeta(int64 step, std::unique_ptr<RunMetadata> run_meta) {
name = node_stat.node_name().substr(0, split_pos);
}
auto node = nodes_map_.find(name);
- if (node != nodes_map_.end()) {
+ if (node == nodes_map_.end()) {
+ NodeDef def;
+ if (CreateRunMetadataNode(name, &def)) {
+ NodeDef* ndef = graph_->mutable_node()->Add();
+ ndef->MergeFrom(def);
+ nodes_map_[name] =
+ std::unique_ptr<TFGraphNode>(new TFGraphNode(ndef));
+ nodes_map_.at(name)->AddStepStat(step, dev_stat.device(), node_stat);
+ }
+ } else {
node->second->AddStepStat(step, dev_stat.device(), node_stat);
}
}
diff --git a/tensorflow/core/profiler/internal/tfprof_stats_test.cc b/tensorflow/core/profiler/internal/tfprof_stats_test.cc
index e67c158521..0bea14fe66 100644
--- a/tensorflow/core/profiler/internal/tfprof_stats_test.cc
+++ b/tensorflow/core/profiler/internal/tfprof_stats_test.cc
@@ -137,7 +137,7 @@ TEST_F(TFProfStatsTest, TestGraph) {
GraphNodeProto expected;
CHECK(protobuf::TextFormat::ParseFromString(
- "name: \"_TFProfRoot\"\ntotal_exec_micros: 4904\ntotal_requested_bytes: "
+ "name: \"_TFProfRoot\"\ntotal_exec_micros: 4945\ntotal_requested_bytes: "
"14592\ntotal_parameters: 451\nchildren {\n name: "
"\"DW/Initializer/random_normal/mul\"\n children {\n name: "
"\"DW/Initializer/random_normal/RandomStandardNormal\"\n children {\n "
@@ -152,8 +152,8 @@ TEST_F(TFProfStatsTest, TestGraph) {
"}\n input_shapes {\n key: 1\n value {\n dim {\n "
"size: 1\n }\n }\n }\n total_definition_count: "
"4\n}\ntotal_float_ops: 10440\ntotal_accelerator_exec_micros: "
- "404\ntotal_cpu_exec_micros: 4500\ntotal_run_count: "
- "5\ntotal_definition_count: 31\ntotal_peak_bytes: "
+ "404\ntotal_cpu_exec_micros: 4541\ntotal_run_count: "
+ "6\ntotal_definition_count: 32\ntotal_peak_bytes: "
"9984\ntotal_residual_bytes: 1280\ntotal_output_bytes: 4864\n",
&expected));
EXPECT_EQ(expected.DebugString(), root.DebugString());
@@ -166,7 +166,7 @@ TEST_F(TFProfStatsTest, TestFloatOps) {
GraphNodeProto expected;
CHECK(protobuf::TextFormat::ParseFromString(
- "name: \"_TFProfRoot\"\ntotal_exec_micros: 4904\ntotal_requested_bytes: "
+ "name: \"_TFProfRoot\"\ntotal_exec_micros: 4945\ntotal_requested_bytes: "
"14592\ntotal_parameters: 451\nchildren {\n name: \"Conv2D\"\n "
"exec_micros: 4292\n requested_bytes: 9472\n total_exec_micros: 4292\n "
" total_requested_bytes: 9472\n devices: "
@@ -199,7 +199,7 @@ TEST_F(TFProfStatsTest, TestFloatOps) {
"output_bytes: 512\n total_peak_bytes: 4096\n total_residual_bytes: "
"512\n total_output_bytes: 512\n}\ntotal_float_ops: "
"10440\ntotal_accelerator_exec_micros: 404\ntotal_cpu_exec_micros: "
- "4500\ntotal_run_count: 5\ntotal_definition_count: 34\ntotal_peak_bytes: "
+ "4541\ntotal_run_count: 6\ntotal_definition_count: 35\ntotal_peak_bytes: "
"9984\ntotal_residual_bytes: 1280\ntotal_output_bytes: 4864\n",
&expected));
EXPECT_EQ(expected.DebugString(), root.DebugString());
@@ -246,7 +246,7 @@ TEST_F(TFProfStatsTest, TestShowTensorValue) {
const GraphNodeProto& root = tf_stats_->ShowGraphNode("scope", opts);
GraphNodeProto expected;
CHECK(protobuf::TextFormat::ParseFromString(
- "name: \"_TFProfRoot\"\ntotal_exec_micros: 4904\ntotal_requested_bytes: "
+ "name: \"_TFProfRoot\"\ntotal_exec_micros: 4945\ntotal_requested_bytes: "
"14592\ntotal_parameters: 451\nchildren {\n name: \"DW\"\n "
"exec_micros: 2\n parameters: 162\n total_exec_micros: 2\n "
"total_parameters: 162\n devices: "
@@ -336,7 +336,7 @@ TEST_F(TFProfStatsTest, TestShowTensorValue) {
"total_run_count: 1\n total_definition_count: 10\n output_bytes: "
"1280\n total_output_bytes: 1280\n}\ntotal_float_ops: "
"10440\ntotal_accelerator_exec_micros: 404\ntotal_cpu_exec_micros: "
- "4500\ntotal_run_count: 5\ntotal_definition_count: 34\ntotal_peak_bytes: "
+ "4541\ntotal_run_count: 6\ntotal_definition_count: 35\ntotal_peak_bytes: "
"9984\ntotal_residual_bytes: 1280\ntotal_output_bytes: 4864\n",
&expected));
EXPECT_EQ(expected.DebugString(), root.DebugString());
diff --git a/tensorflow/core/profiler/internal/tfprof_timeline_test.cc b/tensorflow/core/profiler/internal/tfprof_timeline_test.cc
index 2fe3653ec2..5dd440e9a2 100644
--- a/tensorflow/core/profiler/internal/tfprof_timeline_test.cc
+++ b/tensorflow/core/profiler/internal/tfprof_timeline_test.cc
@@ -71,7 +71,7 @@ TEST_F(TFProfTimelineTest, GraphView) {
string dump_str;
TF_CHECK_OK(ReadFileToString(Env::Default(), dump_file, &dump_str));
- EXPECT_EQ(16947107375505024864ull, Hash64(dump_str));
+ EXPECT_EQ(1754536562981488144ull, Hash64(dump_str));
}
TEST_F(TFProfTimelineTest, ScopeView) {
@@ -85,7 +85,7 @@ TEST_F(TFProfTimelineTest, ScopeView) {
string dump_str;
TF_CHECK_OK(ReadFileToString(Env::Default(), dump_file, &dump_str));
- EXPECT_EQ(2710044785377031280ull, Hash64(dump_str));
+ EXPECT_EQ(17545174915963890413ull, Hash64(dump_str));
}
// TODO(xpan): tfprof_log is too large to include in testdata when adding