Fine-grained memory profiling

Add residual_bytes, peak_bytes and output_bytes. Allow to order/select/filter by accelerator_micros/cpu_micros/peak_bytes/residual_bytes/output_bytes Also updated the testdata. PiperOrigin-RevId: 164079214
author: A. Unique TensorFlower <gardener@tensorflow.org> 2017-08-02 21:29:03 -0700
committer: TensorFlower Gardener <gardener@tensorflow.org> 2017-08-02 21:34:37 -0700
commit: 19c27ef0d52c20a12800005751d36f96bd948869 (patch)
tree: 2b7ae380f2ea8f50b9db4a6967906430a1ac94b6 /tensorflow/core/profiler
parent: 565b872d040338d4369885877b8decdfac1faab1 (diff)
33 files changed, 1383 insertions, 1401 deletions
diff --git a/tensorflow/core/profiler/README.md b/tensorflow/core/profiler/README.md
index e748daba7a..6db38a59ae 100644
--- a/tensorflow/core/profiler/README.md
+++ b/tensorflow/core/profiler/README.md
@@ -106,7 +106,7 @@ _TFProfRoot (--/930.58k params)
 ### Show the most expensive operation types.
 ```
 tfprof> op -select micros,bytes,occurrence -order_by micros
-node name | output bytes | total execution time | accelerator execution time | cpu execution time | op occurrence (run|defined)
+node name | requested bytes | total execution time | accelerator execution time | cpu execution time | op occurrence (run|defined)
 SoftmaxCrossEntropyWithLogits      36.58MB (100.00%, 0.05%),      1.37sec (100.00%, 26.68%),           0us (100.00%, 0.00%),      1.37sec (100.00%, 30.75%),      30|30
 MatMul                        2720.57MB (99.95%, 3.66%),      708.14ms (73.32%, 13.83%),     280.76ms (100.00%, 41.42%),       427.39ms (69.25%, 9.62%),  2694|3450
 ConcatV2                       741.37MB (96.29%, 1.00%),       389.63ms (59.49%, 7.61%),        31.80ms (58.58%, 4.69%),       357.83ms (59.63%, 8.05%),  4801|6098
@@ -192,7 +192,7 @@ Open a Chrome browser, enter URL chrome://tracing and load the timeline file.
 ******************************************************
 ```
 <left>
-[Timeline](g3doc/graph_timeline.png)
+![Timeline](g3doc/graph_timeline.png)
 </left>
 
 ```
@@ -213,7 +213,7 @@ pprof -png --nodecount=20 --sample_index=1 <filename>
 ```
 
 <left>
-[PprofGraph](g3doc/pprof.jpg)
+![PprofGraph](g3doc/pprof.jpg)
 </left>
 
 ### Feature Request and Bug Report
diff --git a/tensorflow/core/profiler/g3doc/options.md b/tensorflow/core/profiler/g3doc/options.md
index 9508379324..bdcc6b2bd8 100644
--- a/tensorflow/core/profiler/g3doc/options.md
+++ b/tensorflow/core/profiler/g3doc/options.md
@@ -48,7 +48,18 @@ In graph view, in means the number of hops in the <b>graph</b>.
 
 `-min_bytes`: Show nodes that request at least this number of bytes.
 
-`-min_micros`: Show nodes that spend at least this number of microseconds to run.
+`-min_peak_bytes`: Show nodes that using at least this number of bytes during peak memory usage.
+
+`-min_residual_bytes`: Show nodes that have at least this number of bytes not being de-allocated after Compute.
+
+`-min_output_bytes`: Show nodes that have at least this number of bytes output (no necessarily allocated by the nodes).
+
+`-min_micros`: Show nodes that spend at least this number of microseconds to run. It sums
+accelerator_micros and cpu_micros. Note: cpu and accelerator can run in parallel.
+
+`-min_accelerator_micros`: Show nodes that spend at least this number of microseconds to run on accelerator (e.g. GPU).
+
+`-min_cpu_micros`: Show nodes that spend at least this number of microseconds to run on CPU.
 
 `-min_params`: Show nodes that contains at least this number of parameters.
 
@@ -58,7 +69,7 @@ In graph view, in means the number of hops in the <b>graph</b>.
 
 `-step`: Show the stats of the this step when multiple steps of RunMetadata were added. By default, show the average of all steps."
 
-`-order_by`: Order the results by [name|depth|bytes|micros|accelerator_micros|cpu_micros|params|float_ops|occurrence]
+`-order_by`: Order the results by [name|depth|bytes|peak_bytes|residual_bytes|output_bytes|micros|accelerator_micros|cpu_micros|params|float_ops|occurrence]
 
 `-account_type_regexes`: Account and display the nodes whose types match one of the type regexes specified. tfprof allow user to define extra operation types for graph nodes through tensorflow.tfprof.OpLogProto proto. regexes are comma-sperated.
 
@@ -76,7 +87,7 @@ In graph view, in means the number of hops in the <b>graph</b>.
 Notes: See <b>overview</b> sesion on how does above options play with each other to decide the output and counting.
 
 `-select`: Comma-separated list of attributes to show. Supported attributes:
-[bytes|micros|accelerator_micros|cpu_micros|params|float_ops|occurrence|tensor_value|device|op_types|input_shapes].
+[bytes|peak_bytes|residual_bytes|output_bytes|micros|accelerator_micros|cpu_micros|params|float_ops|occurrence|tensor_value|device|op_types|input_shapes].
 
 `-output`: Output results as stdout, file or timeline.
 The format is ```output_type:key=value,key=value```.
diff --git a/tensorflow/core/profiler/g3doc/profile_memory.md b/tensorflow/core/profiler/g3doc/profile_memory.md
index e897967d3b..a00683d062 100644
--- a/tensorflow/core/profiler/g3doc/profile_memory.md
+++ b/tensorflow/core/profiler/g3doc/profile_memory.md
@@ -15,7 +15,6 @@ Open a Chrome browser, enter URL chrome://tracing and load the timeline file.
 ```
 
 <left>
-TODO(xpan): Show the image correctly in github.
 ![Timeline](graph_timeline.png)
 </left>
 
@@ -26,7 +25,7 @@ TODO(xpan): Show the image correctly in github.
 # With op view, it shows you the aggregated output tensor bytes of each
 # operation type.
 tfprof> op -select bytes -order_by bytes
-node name | output bytes
+node name | requested bytes
 Identity                   32515.37MB (100.00%, 27.02%)
 FusedBatchNormGrad           10802.14MB (72.98%, 8.98%)
 FusedBatchNorm               10517.52MB (64.01%, 8.74%)
@@ -41,7 +40,7 @@ AddN                           2741.49MB (8.56%, 2.28%)
 
 # With scope view, you can see the operations that outputs largest tensors.
 tfprof> scope -order_by bytes -select bytes -min_bytes 100000000
-node name | output bytes
+node name | requested bytes
 _TFProfRoot (--/120356.38MB)
   tower_3/SepConv2d_2b_3x3/separable_conv2d (346.85MB/854.00MB)
     tower_3/SepConv2d_2b_3x3/separable_conv2d/depthwise (507.15MB/507.15MB)
@@ -61,7 +60,7 @@ _TFProfRoot (--/120356.38MB)
 
 # code view.
 tfprof> code  -max_depth 10 -select bytes -order_by bytes -start_name_regexes .*seq2seq.* -min_bytes 1
-node name | output bytes
+node name | requested bytes
 _TFProfRoot (--/74148.60MB)
   seq2seq_attention.py'>:168:run_filename_from...:none (0B/74148.60MB)
     seq2seq_attention.py'>:33:_run_code_in_main:none (0B/74148.60MB)
diff --git a/tensorflow/core/profiler/internal/advisor/expensive_operation_checker.h b/tensorflow/core/profiler/internal/advisor/expensive_operation_checker.h
index 85b99dc951..8b4b90b633 100644
--- a/tensorflow/core/profiler/internal/advisor/expensive_operation_checker.h
+++ b/tensorflow/core/profiler/internal/advisor/expensive_operation_checker.h
@@ -47,8 +47,8 @@ class ExpensiveOperationChecker : public Checker {
       fprintf(stderr, "Missing run_meta for %s\n", name().c_str());
       return;
     }
-    Options opts(3, 0, 1, 0, 0, 0, -1, "micros", {".*"}, {".*"}, {}, {".*"}, {},
-                 false, {"micros", "occurrence"}, "none", {});
+    Options opts(3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, -1, "micros", {".*"}, {".*"},
+                 {}, {".*"}, {}, false, {"micros", "occurrence"}, "none", {});
     const MultiGraphNodeProto root = stats->ShowMultiGraphNode("op", opts);
     if (root.children_size() == 0) {
       return;
@@ -74,8 +74,8 @@ class ExpensiveOperationChecker : public Checker {
       fprintf(stderr, "Missing op_log (code traces) for %s\n", name().c_str());
       return;
     }
-    Options opts(100, 0, 1, 0, 0, 0, -1, "micros", {".*"}, {".*"}, {}, {".*"},
-                 {}, false, {"micros"}, "none", {});
+    Options opts(100, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, -1, "micros", {".*"},
+                 {".*"}, {}, {".*"}, {}, false, {"micros"}, "none", {});
     const MultiGraphNodeProto root = stats->ShowMultiGraphNode("code", opts);
     const MultiGraphNodeProto* node = &root;
     // A trick here is: Usually, codes in library file are usually referenced
@@ -93,8 +93,8 @@ class ExpensiveOperationChecker : public Checker {
   }
 
   void CheckScopeView(const TFStats* stats) {
-    Options opts(100, 0, 100, 0, 0, 0, -1, "micros", {".*"}, {".*"}, {}, {".*"},
-                 {}, false, {"micros"}, "none", {});
+    Options opts(100, 0, 0, 0, 0, 100, 0, 0, 0, 0, 0, -1, "micros", {".*"},
+                 {".*"}, {}, {".*"}, {}, false, {"micros"}, "none", {});
     const GraphNodeProto root = stats->ShowGraphNode("scope", opts);
     if (root.children_size() == 0) {
       return;
diff --git a/tensorflow/core/profiler/internal/testdata/ckpt.data-00000-of-00001 b/tensorflow/core/profiler/internal/testdata/ckpt.data-00000-of-00001
index 045063943f..067f866c82 100644
--- a/tensorflow/core/profiler/internal/testdata/ckpt.data-00000-of-00001
+++ b/tensorflow/core/profiler/internal/testdata/ckpt.data-00000-of-00001
diff --git a/tensorflow/core/profiler/internal/testdata/ckpt.index b/tensorflow/core/profiler/internal/testdata/ckpt.index
index 908198167d..2097de8da2 100644
--- a/tensorflow/core/profiler/internal/testdata/ckpt.index
+++ b/tensorflow/core/profiler/internal/testdata/ckpt.index
diff --git a/tensorflow/core/profiler/internal/testdata/ckpt.meta b/tensorflow/core/profiler/internal/testdata/ckpt.meta
index 94fe29ad5c..b907e4ab50 100644
--- a/tensorflow/core/profiler/internal/testdata/ckpt.meta
+++ b/tensorflow/core/profiler/internal/testdata/ckpt.meta
diff --git a/tensorflow/core/profiler/internal/testdata/graph.pbtxt b/tensorflow/core/profiler/internal/testdata/graph.pbtxt
index e6fae2c4cf..62bba4a7bf 100644
--- a/tensorflow/core/profiler/internal/testdata/graph.pbtxt
+++ b/tensorflow/core/profiler/internal/testdata/graph.pbtxt
@@ -2,6 +2,27 @@ node {
   name: "zeros"
   op: "Const"
   attr {
+    key: "_output_shapes"
+    value {
+      list {
+        shape {
+          dim {
+            size: 2
+          }
+          dim {
+            size: 6
+          }
+          dim {
+            size: 6
+          }
+          dim {
+            size: 3
+          }
+        }
+      }
+    }
+  }
+  attr {
     key: "dtype"
     value {
       type: DT_FLOAT
@@ -17,10 +38,10 @@ node {
             size: 2
           }
           dim {
-            size: 8
+            size: 6
           }
           dim {
-            size: 8
+            size: 6
           }
           dim {
             size: 3
@@ -32,13 +53,24 @@ node {
   }
 }
 node {
-  name: "conv2d/kernel/Initializer/random_uniform/shape"
+  name: "ScalarW/Initializer/random_normal/shape"
   op: "Const"
   attr {
     key: "_class"
     value {
       list {
-        s: "loc:@conv2d/kernel"
+        s: "loc:@ScalarW"
+      }
+    }
+  }
+  attr {
+    key: "_output_shapes"
+    value {
+      list {
+        shape {
+          dim {
+          }
+        }
       }
     }
   }
@@ -55,22 +87,29 @@ node {
         dtype: DT_INT32
         tensor_shape {
           dim {
-            size: 4
           }
         }
-        tensor_content: "\003\000\000\000\003\000\000\000\003\000\000\000\005\000\000\000"
       }
     }
   }
 }
 node {
-  name: "conv2d/kernel/Initializer/random_uniform/min"
+  name: "ScalarW/Initializer/random_normal/mean"
   op: "Const"
   attr {
     key: "_class"
     value {
       list {
-        s: "loc:@conv2d/kernel"
+        s: "loc:@ScalarW"
+      }
+    }
+  }
+  attr {
+    key: "_output_shapes"
+    value {
+      list {
+        shape {
+        }
       }
     }
   }
@@ -87,19 +126,28 @@ node {
         dtype: DT_FLOAT
         tensor_shape {
         }
-        float_val: -0.288675129414
+        float_val: 0.0
       }
     }
   }
 }
 node {
-  name: "conv2d/kernel/Initializer/random_uniform/max"
+  name: "ScalarW/Initializer/random_normal/stddev"
   op: "Const"
   attr {
     key: "_class"
     value {
       list {
-        s: "loc:@conv2d/kernel"
+        s: "loc:@ScalarW"
+      }
+    }
+  }
+  attr {
+    key: "_output_shapes"
+    value {
+      list {
+        shape {
+        }
       }
     }
   }
@@ -116,15 +164,15 @@ node {
         dtype: DT_FLOAT
         tensor_shape {
         }
-        float_val: 0.288675129414
+        float_val: 0.0010000000475
       }
     }
   }
 }
 node {
-  name: "conv2d/kernel/Initializer/random_uniform/RandomUniform"
-  op: "RandomUniform"
-  input: "conv2d/kernel/Initializer/random_uniform/shape"
+  name: "ScalarW/Initializer/random_normal/RandomStandardNormal"
+  op: "RandomStandardNormal"
+  input: "ScalarW/Initializer/random_normal/shape"
   attr {
     key: "T"
     value {
@@ -135,7 +183,16 @@ node {
     key: "_class"
     value {
       list {
-        s: "loc:@conv2d/kernel"
+        s: "loc:@ScalarW"
+      }
+    }
+  }
+  attr {
+    key: "_output_shapes"
+    value {
+      list {
+        shape {
+        }
       }
     }
   }
@@ -159,10 +216,10 @@ node {
   }
 }
 node {
-  name: "conv2d/kernel/Initializer/random_uniform/sub"
-  op: "Sub"
-  input: "conv2d/kernel/Initializer/random_uniform/max"
-  input: "conv2d/kernel/Initializer/random_uniform/min"
+  name: "ScalarW/Initializer/random_normal/mul"
+  op: "Mul"
+  input: "ScalarW/Initializer/random_normal/RandomStandardNormal"
+  input: "ScalarW/Initializer/random_normal/stddev"
   attr {
     key: "T"
     value {
@@ -173,36 +230,25 @@ node {
     key: "_class"
     value {
       list {
-        s: "loc:@conv2d/kernel"
+        s: "loc:@ScalarW"
       }
     }
   }
-}
-node {
-  name: "conv2d/kernel/Initializer/random_uniform/mul"
-  op: "Mul"
-  input: "conv2d/kernel/Initializer/random_uniform/RandomUniform"
-  input: "conv2d/kernel/Initializer/random_uniform/sub"
   attr {
-    key: "T"
-    value {
-      type: DT_FLOAT
-    }
-  }
-  attr {
-    key: "_class"
+    key: "_output_shapes"
     value {
       list {
-        s: "loc:@conv2d/kernel"
+        shape {
+        }
       }
     }
   }
 }
 node {
-  name: "conv2d/kernel/Initializer/random_uniform"
+  name: "ScalarW/Initializer/random_normal"
   op: "Add"
-  input: "conv2d/kernel/Initializer/random_uniform/mul"
-  input: "conv2d/kernel/Initializer/random_uniform/min"
+  input: "ScalarW/Initializer/random_normal/mul"
+  input: "ScalarW/Initializer/random_normal/mean"
   attr {
     key: "T"
     value {
@@ -213,151 +259,37 @@ node {
     key: "_class"
     value {
       list {
-        s: "loc:@conv2d/kernel"
+        s: "loc:@ScalarW"
       }
     }
   }
-}
-node {
-  name: "conv2d/kernel"
-  op: "VariableV2"
   attr {
-    key: "_class"
+    key: "_output_shapes"
     value {
       list {
-        s: "loc:@conv2d/kernel"
-      }
-    }
-  }
-  attr {
-    key: "container"
-    value {
-      s: ""
-    }
-  }
-  attr {
-    key: "dtype"
-    value {
-      type: DT_FLOAT
-    }
-  }
-  attr {
-    key: "shape"
-    value {
-      shape {
-        dim {
-          size: 3
-        }
-        dim {
-          size: 3
-        }
-        dim {
-          size: 3
-        }
-        dim {
-          size: 5
+        shape {
         }
       }
     }
   }
-  attr {
-    key: "shared_name"
-    value {
-      s: ""
-    }
-  }
 }
 node {
-  name: "conv2d/kernel/Assign"
-  op: "Assign"
-  input: "conv2d/kernel"
-  input: "conv2d/kernel/Initializer/random_uniform"
-  attr {
-    key: "T"
-    value {
-      type: DT_FLOAT
-    }
-  }
-  attr {
-    key: "_class"
-    value {
-      list {
-        s: "loc:@conv2d/kernel"
-      }
-    }
-  }
-  attr {
-    key: "use_locking"
-    value {
-      b: true
-    }
-  }
-  attr {
-    key: "validate_shape"
-    value {
-      b: true
-    }
-  }
-}
-node {
-  name: "conv2d/kernel/read"
-  op: "Identity"
-  input: "conv2d/kernel"
-  attr {
-    key: "T"
-    value {
-      type: DT_FLOAT
-    }
-  }
+  name: "ScalarW"
+  op: "VariableV2"
   attr {
     key: "_class"
     value {
       list {
-        s: "loc:@conv2d/kernel"
+        s: "loc:@ScalarW"
       }
     }
   }
-}
-node {
-  name: "conv2d/bias/Initializer/Const"
-  op: "Const"
   attr {
-    key: "_class"
+    key: "_output_shapes"
     value {
       list {
-        s: "loc:@conv2d/bias"
-      }
-    }
-  }
-  attr {
-    key: "dtype"
-    value {
-      type: DT_FLOAT
-    }
-  }
-  attr {
-    key: "value"
-    value {
-      tensor {
-        dtype: DT_FLOAT
-        tensor_shape {
-          dim {
-            size: 5
-          }
+        shape {
         }
-        float_val: 0.0
-      }
-    }
-  }
-}
-node {
-  name: "conv2d/bias"
-  op: "VariableV2"
-  attr {
-    key: "_class"
-    value {
-      list {
-        s: "loc:@conv2d/bias"
       }
     }
   }
@@ -377,9 +309,6 @@ node {
     key: "shape"
     value {
       shape {
-        dim {
-          size: 5
-        }
       }
     }
   }
@@ -391,10 +320,10 @@ node {
   }
 }
 node {
-  name: "conv2d/bias/Assign"
+  name: "ScalarW/Assign"
   op: "Assign"
-  input: "conv2d/bias"
-  input: "conv2d/bias/Initializer/Const"
+  input: "ScalarW"
+  input: "ScalarW/Initializer/random_normal"
   attr {
     key: "T"
     value {
@@ -405,7 +334,16 @@ node {
     key: "_class"
     value {
       list {
-        s: "loc:@conv2d/bias"
+        s: "loc:@ScalarW"
+      }
+    }
+  }
+  attr {
+    key: "_output_shapes"
+    value {
+      list {
+        shape {
+        }
       }
     }
   }
@@ -423,9 +361,9 @@ node {
   }
 }
 node {
-  name: "conv2d/bias/read"
+  name: "ScalarW/read"
   op: "Identity"
-  input: "conv2d/bias"
+  input: "ScalarW"
   attr {
     key: "T"
     value {
@@ -436,38 +374,43 @@ node {
     key: "_class"
     value {
       list {
-        s: "loc:@conv2d/bias"
+        s: "loc:@ScalarW"
+      }
+    }
+  }
+  attr {
+    key: "_output_shapes"
+    value {
+      list {
+        shape {
+        }
       }
     }
   }
 }
 node {
-  name: "conv2d/convolution/Shape"
+  name: "DW/Initializer/random_normal/shape"
   op: "Const"
   attr {
-    key: "dtype"
+    key: "_class"
     value {
-      type: DT_INT32
+      list {
+        s: "loc:@DW"
+      }
     }
   }
   attr {
-    key: "value"
+    key: "_output_shapes"
     value {
-      tensor {
-        dtype: DT_INT32
-        tensor_shape {
+      list {
+        shape {
           dim {
             size: 4
           }
         }
-        tensor_content: "\003\000\000\000\003\000\000\000\003\000\000\000\005\000\000\000"
       }
     }
   }
-}
-node {
-  name: "conv2d/convolution/dilation_rate"
-  op: "Const"
   attr {
     key: "dtype"
     value {
@@ -481,142 +424,69 @@ node {
         dtype: DT_INT32
         tensor_shape {
           dim {
-            size: 2
+            size: 4
           }
         }
-        tensor_content: "\001\000\000\000\001\000\000\000"
+        tensor_content: "\003\000\000\000\003\000\000\000\003\000\000\000\006\000\000\000"
       }
     }
   }
 }
 node {
-  name: "conv2d/convolution"
-  op: "Conv2D"
-  input: "zeros"
-  input: "conv2d/kernel/read"
-  attr {
-    key: "T"
-    value {
-      type: DT_FLOAT
-    }
-  }
-  attr {
-    key: "data_format"
-    value {
-      s: "NHWC"
-    }
-  }
-  attr {
-    key: "padding"
-    value {
-      s: "VALID"
-    }
-  }
+  name: "DW/Initializer/random_normal/mean"
+  op: "Const"
   attr {
-    key: "strides"
+    key: "_class"
     value {
       list {
-        i: 1
-        i: 1
-        i: 1
-        i: 1
+        s: "loc:@DW"
       }
     }
   }
   attr {
-    key: "use_cudnn_on_gpu"
-    value {
-      b: true
-    }
-  }
-}
-node {
-  name: "conv2d/BiasAdd"
-  op: "BiasAdd"
-  input: "conv2d/convolution"
-  input: "conv2d/bias/read"
-  attr {
-    key: "T"
-    value {
-      type: DT_FLOAT
-    }
-  }
-  attr {
-    key: "data_format"
-    value {
-      s: "NHWC"
-    }
-  }
-}
-node {
-  name: "conv2d_1/kernel/Initializer/random_uniform/shape"
-  op: "Const"
-  attr {
-    key: "_class"
+    key: "_output_shapes"
     value {
       list {
-        s: "loc:@conv2d_1/kernel"
+        shape {
+        }
       }
     }
   }
   attr {
     key: "dtype"
     value {
-      type: DT_INT32
+      type: DT_FLOAT
     }
   }
   attr {
     key: "value"
     value {
       tensor {
-        dtype: DT_INT32
+        dtype: DT_FLOAT
         tensor_shape {
-          dim {
-            size: 4
-          }
         }
-        tensor_content: "\003\000\000\000\003\000\000\000\005\000\000\000\005\000\000\000"
+        float_val: 0.0
       }
     }
   }
 }
 node {
-  name: "conv2d_1/kernel/Initializer/random_uniform/min"
+  name: "DW/Initializer/random_normal/stddev"
   op: "Const"
   attr {
     key: "_class"
     value {
       list {
-        s: "loc:@conv2d_1/kernel"
-      }
-    }
-  }
-  attr {
-    key: "dtype"
-    value {
-      type: DT_FLOAT
-    }
-  }
-  attr {
-    key: "value"
-    value {
-      tensor {
-        dtype: DT_FLOAT
-        tensor_shape {
-        }
-        float_val: -0.25819888711
+        s: "loc:@DW"
       }
     }
   }
-}
-node {
-  name: "conv2d_1/kernel/Initializer/random_uniform/max"
-  op: "Const"
   attr {
-    key: "_class"
+    key: "_output_shapes"
     value {
       list {
-        s: "loc:@conv2d_1/kernel"
+        shape {
+        }
       }
     }
   }
@@ -633,15 +503,15 @@ node {
         dtype: DT_FLOAT
         tensor_shape {
         }
-        float_val: 0.25819888711
+        float_val: 0.0010000000475
       }
     }
   }
 }
 node {
-  name: "conv2d_1/kernel/Initializer/random_uniform/RandomUniform"
-  op: "RandomUniform"
-  input: "conv2d_1/kernel/Initializer/random_uniform/shape"
+  name: "DW/Initializer/random_normal/RandomStandardNormal"
+  op: "RandomStandardNormal"
+  input: "DW/Initializer/random_normal/shape"
   attr {
     key: "T"
     value {
@@ -652,7 +522,28 @@ node {
     key: "_class"
     value {
       list {
-        s: "loc:@conv2d_1/kernel"
+        s: "loc:@DW"
+      }
+    }
+  }
+  attr {
+    key: "_output_shapes"
+    value {
+      list {
+        shape {
+          dim {
+            size: 3
+          }
+          dim {
+            size: 3
+          }
+          dim {
+            size: 3
+          }
+          dim {
+            size: 6
+          }
+        }
       }
     }
   }
@@ -676,10 +567,10 @@ node {
   }
 }
 node {
-  name: "conv2d_1/kernel/Initializer/random_uniform/sub"
-  op: "Sub"
-  input: "conv2d_1/kernel/Initializer/random_uniform/max"
-  input: "conv2d_1/kernel/Initializer/random_uniform/min"
+  name: "DW/Initializer/random_normal/mul"
+  op: "Mul"
+  input: "DW/Initializer/random_normal/RandomStandardNormal"
+  input: "DW/Initializer/random_normal/stddev"
   attr {
     key: "T"
     value {
@@ -690,36 +581,37 @@ node {
     key: "_class"
     value {
       list {
-        s: "loc:@conv2d_1/kernel"
+        s: "loc:@DW"
       }
     }
   }
-}
-node {
-  name: "conv2d_1/kernel/Initializer/random_uniform/mul"
-  op: "Mul"
-  input: "conv2d_1/kernel/Initializer/random_uniform/RandomUniform"
-  input: "conv2d_1/kernel/Initializer/random_uniform/sub"
-  attr {
-    key: "T"
-    value {
-      type: DT_FLOAT
-    }
-  }
   attr {
-    key: "_class"
+    key: "_output_shapes"
     value {
       list {
-        s: "loc:@conv2d_1/kernel"
+        shape {
+          dim {
+            size: 3
+          }
+          dim {
+            size: 3
+          }
+          dim {
+            size: 3
+          }
+          dim {
+            size: 6
+          }
+        }
       }
     }
   }
 }
 node {
-  name: "conv2d_1/kernel/Initializer/random_uniform"
+  name: "DW/Initializer/random_normal"
   op: "Add"
-  input: "conv2d_1/kernel/Initializer/random_uniform/mul"
-  input: "conv2d_1/kernel/Initializer/random_uniform/min"
+  input: "DW/Initializer/random_normal/mul"
+  input: "DW/Initializer/random_normal/mean"
   attr {
     key: "T"
     value {
@@ -730,19 +622,61 @@ node {
     key: "_class"
     value {
       list {
-        s: "loc:@conv2d_1/kernel"
+        s: "loc:@DW"
+      }
+    }
+  }
+  attr {
+    key: "_output_shapes"
+    value {
+      list {
+        shape {
+          dim {
+            size: 3
+          }
+          dim {
+            size: 3
+          }
+          dim {
+            size: 3
+          }
+          dim {
+            size: 6
+          }
+        }
       }
     }
   }
 }
 node {
-  name: "conv2d_1/kernel"
+  name: "DW"
   op: "VariableV2"
   attr {
     key: "_class"
     value {
       list {
-        s: "loc:@conv2d_1/kernel"
+        s: "loc:@DW"
+      }
+    }
+  }
+  attr {
+    key: "_output_shapes"
+    value {
+      list {
+        shape {
+          dim {
+            size: 3
+          }
+          dim {
+            size: 3
+          }
+          dim {
+            size: 3
+          }
+          dim {
+            size: 6
+          }
+        }
       }
     }
   }
@@ -769,10 +703,10 @@ node {
           size: 3
         }
         dim {
-          size: 5
+          size: 3
         }
         dim {
-          size: 5
+          size: 6
         }
       }
     }
@@ -785,10 +719,10 @@ node {
   }
 }
 node {
-  name: "conv2d_1/kernel/Assign"
+  name: "DW/Assign"
   op: "Assign"
-  input: "conv2d_1/kernel"
-  input: "conv2d_1/kernel/Initializer/random_uniform"
+  input: "DW"
+  input: "DW/Initializer/random_normal"
   attr {
     key: "T"
     value {
@@ -799,7 +733,28 @@ node {
     key: "_class"
     value {
       list {
-        s: "loc:@conv2d_1/kernel"
+        s: "loc:@DW"
+      }
+    }
+  }
+  attr {
+    key: "_output_shapes"
+    value {
+      list {
+        shape {
+          dim {
+            size: 3
+          }
+          dim {
+            size: 3
+          }
+          dim {
+            size: 3
+          }
+          dim {
+            size: 6
+          }
+        }
       }
     }
   }
@@ -817,9 +772,9 @@ node {
   }
 }
 node {
-  name: "conv2d_1/kernel/read"
+  name: "DW/read"
   op: "Identity"
-  input: "conv2d_1/kernel"
+  input: "DW"
   attr {
     key: "T"
     value {
@@ -830,161 +785,117 @@ node {
     key: "_class"
     value {
       list {
-        s: "loc:@conv2d_1/kernel"
+        s: "loc:@DW"
       }
     }
   }
-}
-node {
-  name: "conv2d_1/bias/Initializer/Const"
-  op: "Const"
   attr {
-    key: "_class"
+    key: "_output_shapes"
     value {
       list {
-        s: "loc:@conv2d_1/bias"
-      }
-    }
-  }
-  attr {
-    key: "dtype"
-    value {
-      type: DT_FLOAT
-    }
-  }
-  attr {
-    key: "value"
-    value {
-      tensor {
-        dtype: DT_FLOAT
-        tensor_shape {
+        shape {
           dim {
-            size: 5
+            size: 3
+          }
+          dim {
+            size: 3
+          }
+          dim {
+            size: 3
+          }
+          dim {
+            size: 6
           }
         }
-        float_val: 0.0
       }
     }
   }
 }
 node {
-  name: "conv2d_1/bias"
-  op: "VariableV2"
-  attr {
-    key: "_class"
-    value {
-      list {
-        s: "loc:@conv2d_1/bias"
-      }
-    }
-  }
-  attr {
-    key: "container"
-    value {
-      s: ""
-    }
-  }
+  name: "Conv2D"
+  op: "Conv2D"
+  input: "zeros"
+  input: "DW/read"
   attr {
-    key: "dtype"
+    key: "T"
     value {
       type: DT_FLOAT
     }
   }
   attr {
-    key: "shape"
+    key: "_output_shapes"
     value {
-      shape {
-        dim {
-          size: 5
+      list {
+        shape {
+          dim {
+            size: 2
+          }
+          dim {
+            size: 3
+          }
+          dim {
+            size: 3
+          }
+          dim {
+            size: 6
+          }
         }
       }
     }
   }
   attr {
-    key: "shared_name"
+    key: "data_format"
     value {
-      s: ""
+      s: "NHWC"
     }
   }
-}
-node {
-  name: "conv2d_1/bias/Assign"
-  op: "Assign"
-  input: "conv2d_1/bias"
-  input: "conv2d_1/bias/Initializer/Const"
   attr {
-    key: "T"
+    key: "padding"
     value {
-      type: DT_FLOAT
+      s: "SAME"
     }
   }
   attr {
-    key: "_class"
+    key: "strides"
     value {
       list {
-        s: "loc:@conv2d_1/bias"
+        i: 1
+        i: 2
+        i: 2
+        i: 1
       }
     }
   }
   attr {
-    key: "use_locking"
-    value {
-      b: true
-    }
-  }
-  attr {
-    key: "validate_shape"
+    key: "use_cudnn_on_gpu"
     value {
       b: true
     }
   }
 }
 node {
-  name: "conv2d_1/bias/read"
-  op: "Identity"
-  input: "conv2d_1/bias"
-  attr {
-    key: "T"
-    value {
-      type: DT_FLOAT
-    }
-  }
+  name: "DW2/Initializer/random_normal/shape"
+  op: "Const"
   attr {
     key: "_class"
     value {
       list {
-        s: "loc:@conv2d_1/bias"
+        s: "loc:@DW2"
       }
     }
   }
-}
-node {
-  name: "conv2d_2/convolution/Shape"
-  op: "Const"
-  attr {
-    key: "dtype"
-    value {
-      type: DT_INT32
-    }
-  }
   attr {
-    key: "value"
+    key: "_output_shapes"
     value {
-      tensor {
-        dtype: DT_INT32
-        tensor_shape {
+      list {
+        shape {
           dim {
             size: 4
           }
         }
-        tensor_content: "\003\000\000\000\003\000\000\000\005\000\000\000\005\000\000\000"
       }
     }
   }
-}
-node {
-  name: "conv2d_2/convolution/dilation_rate"
-  op: "Const"
   attr {
     key: "dtype"
     value {
@@ -998,258 +909,153 @@ node {
         dtype: DT_INT32
         tensor_shape {
           dim {
-            size: 2
+            size: 4
           }
         }
-        tensor_content: "\001\000\000\000\001\000\000\000"
+        tensor_content: "\002\000\000\000\002\000\000\000\006\000\000\000\014\000\000\000"
       }
     }
   }
 }
 node {
-  name: "conv2d_2/convolution"
-  op: "Conv2D"
-  input: "conv2d/BiasAdd"
-  input: "conv2d_1/kernel/read"
-  attr {
-    key: "T"
-    value {
-      type: DT_FLOAT
-    }
-  }
-  attr {
-    key: "data_format"
-    value {
-      s: "NHWC"
-    }
-  }
-  attr {
-    key: "padding"
-    value {
-      s: "VALID"
-    }
-  }
+  name: "DW2/Initializer/random_normal/mean"
+  op: "Const"
   attr {
-    key: "strides"
+    key: "_class"
     value {
       list {
-        i: 1
-        i: 1
-        i: 1
-        i: 1
+        s: "loc:@DW2"
       }
     }
   }
   attr {
-    key: "use_cudnn_on_gpu"
-    value {
-      b: true
-    }
-  }
-}
-node {
-  name: "conv2d_2/BiasAdd"
-  op: "BiasAdd"
-  input: "conv2d_2/convolution"
-  input: "conv2d_1/bias/read"
-  attr {
-    key: "T"
-    value {
-      type: DT_FLOAT
-    }
-  }
-  attr {
-    key: "data_format"
+    key: "_output_shapes"
     value {
-      s: "NHWC"
+      list {
+        shape {
+        }
+      }
     }
   }
-}
-node {
-  name: "save/Const"
-  op: "Const"
   attr {
     key: "dtype"
     value {
-      type: DT_STRING
+      type: DT_FLOAT
     }
   }
   attr {
     key: "value"
     value {
       tensor {
-        dtype: DT_STRING
+        dtype: DT_FLOAT
         tensor_shape {
         }
-        string_val: "model"
+        float_val: 0.0
       }
     }
   }
 }
 node {
-  name: "save/SaveV2/tensor_names"
+  name: "DW2/Initializer/random_normal/stddev"
   op: "Const"
   attr {
-    key: "dtype"
+    key: "_class"
     value {
-      type: DT_STRING
+      list {
+        s: "loc:@DW2"
+      }
     }
   }
   attr {
-    key: "value"
+    key: "_output_shapes"
     value {
-      tensor {
-        dtype: DT_STRING
-        tensor_shape {
-          dim {
-            size: 4
-          }
+      list {
+        shape {
         }
-        string_val: "conv2d/bias"
-        string_val: "conv2d/kernel"
-        string_val: "conv2d_1/bias"
-        string_val: "conv2d_1/kernel"
       }
     }
   }
-}
-node {
-  name: "save/SaveV2/shape_and_slices"
-  op: "Const"
   attr {
     key: "dtype"
     value {
-      type: DT_STRING
+      type: DT_FLOAT
     }
   }
   attr {
     key: "value"
     value {
       tensor {
-        dtype: DT_STRING
+        dtype: DT_FLOAT
         tensor_shape {
-          dim {
-            size: 4
-          }
         }
-        string_val: ""
-        string_val: ""
-        string_val: ""
-        string_val: ""
-      }
-    }
-  }
-}
-node {
-  name: "save/SaveV2"
-  op: "SaveV2"
-  input: "save/Const"
-  input: "save/SaveV2/tensor_names"
-  input: "save/SaveV2/shape_and_slices"
-  input: "conv2d/bias"
-  input: "conv2d/kernel"
-  input: "conv2d_1/bias"
-  input: "conv2d_1/kernel"
-  attr {
-    key: "dtypes"
-    value {
-      list {
-        type: DT_FLOAT
-        type: DT_FLOAT
-        type: DT_FLOAT
-        type: DT_FLOAT
+        float_val: 0.0010000000475
       }
     }
   }
 }
 node {
-  name: "save/control_dependency"
-  op: "Identity"
-  input: "save/Const"
-  input: "^save/SaveV2"
+  name: "DW2/Initializer/random_normal/RandomStandardNormal"
+  op: "RandomStandardNormal"
+  input: "DW2/Initializer/random_normal/shape"
   attr {
     key: "T"
     value {
-      type: DT_STRING
+      type: DT_INT32
     }
   }
   attr {
     key: "_class"
     value {
       list {
-        s: "loc:@save/Const"
+        s: "loc:@DW2"
       }
     }
   }
-}
-node {
-  name: "save/RestoreV2/tensor_names"
-  op: "Const"
   attr {
-    key: "dtype"
+    key: "_output_shapes"
     value {
-      type: DT_STRING
-    }
-  }
-  attr {
-    key: "value"
-    value {
-      tensor {
-        dtype: DT_STRING
-        tensor_shape {
+      list {
+        shape {
+          dim {
+            size: 2
+          }
+          dim {
+            size: 2
+          }
           dim {
-            size: 1
+            size: 6
+          }
+          dim {
+            size: 12
           }
         }
-        string_val: "conv2d/bias"
       }
     }
   }
-}
-node {
-  name: "save/RestoreV2/shape_and_slices"
-  op: "Const"
   attr {
     key: "dtype"
     value {
-      type: DT_STRING
+      type: DT_FLOAT
     }
   }
   attr {
-    key: "value"
+    key: "seed"
     value {
-      tensor {
-        dtype: DT_STRING
-        tensor_shape {
-          dim {
-            size: 1
-          }
-        }
-        string_val: ""
-      }
+      i: 0
     }
   }
-}
-node {
-  name: "save/RestoreV2"
-  op: "RestoreV2"
-  input: "save/Const"
-  input: "save/RestoreV2/tensor_names"
-  input: "save/RestoreV2/shape_and_slices"
   attr {
-    key: "dtypes"
+    key: "seed2"
     value {
-      list {
-        type: DT_FLOAT
-      }
+      i: 0
     }
   }
 }
 node {
-  name: "save/Assign"
-  op: "Assign"
-  input: "conv2d/bias"
-  input: "save/RestoreV2"
+  name: "DW2/Initializer/random_normal/mul"
+  op: "Mul"
+  input: "DW2/Initializer/random_normal/RandomStandardNormal"
+  input: "DW2/Initializer/random_normal/stddev"
   attr {
     key: "T"
     value {
@@ -1260,91 +1066,37 @@ node {
     key: "_class"
     value {
       list {
-        s: "loc:@conv2d/bias"
+        s: "loc:@DW2"
       }
     }
   }
   attr {
-    key: "use_locking"
-    value {
-      b: true
-    }
-  }
-  attr {
-    key: "validate_shape"
-    value {
-      b: true
-    }
-  }
-}
-node {
-  name: "save/RestoreV2_1/tensor_names"
-  op: "Const"
-  attr {
-    key: "dtype"
-    value {
-      type: DT_STRING
-    }
-  }
-  attr {
-    key: "value"
+    key: "_output_shapes"
     value {
-      tensor {
-        dtype: DT_STRING
-        tensor_shape {
+      list {
+        shape {
           dim {
-            size: 1
+            size: 2
+          }
+          dim {
+            size: 2
+          }
+          dim {
+            size: 6
           }
-        }
-        string_val: "conv2d/kernel"
-      }
-    }
-  }
-}
-node {
-  name: "save/RestoreV2_1/shape_and_slices"
-  op: "Const"
-  attr {
-    key: "dtype"
-    value {
-      type: DT_STRING
-    }
-  }
-  attr {
-    key: "value"
-    value {
-      tensor {
-        dtype: DT_STRING
-        tensor_shape {
           dim {
-            size: 1
+            size: 12
           }
         }
-        string_val: ""
       }
     }
   }
 }
 node {
-  name: "save/RestoreV2_1"
-  op: "RestoreV2"
-  input: "save/Const"
-  input: "save/RestoreV2_1/tensor_names"
-  input: "save/RestoreV2_1/shape_and_slices"
-  attr {
-    key: "dtypes"
-    value {
-      list {
-        type: DT_FLOAT
-      }
-    }
-  }
-}
-node {
-  name: "save/Assign_1"
-  op: "Assign"
-  input: "conv2d/kernel"
-  input: "save/RestoreV2_1"
+  name: "DW2/Initializer/random_normal"
+  op: "Add"
+  input: "DW2/Initializer/random_normal/mul"
+  input: "DW2/Initializer/random_normal/mean"
   attr {
     key: "T"
     value {
@@ -1355,91 +1107,107 @@ node {
     key: "_class"
     value {
       list {
-        s: "loc:@conv2d/kernel"
+        s: "loc:@DW2"
       }
     }
   }
   attr {
-    key: "use_locking"
-    value {
-      b: true
-    }
-  }
-  attr {
-    key: "validate_shape"
+    key: "_output_shapes"
     value {
-      b: true
+      list {
+        shape {
+          dim {
+            size: 2
+          }
+          dim {
+            size: 2
+          }
+          dim {
+            size: 6
+          }
+          dim {
+            size: 12
+          }
+        }
+      }
     }
   }
 }
 node {
-  name: "save/RestoreV2_2/tensor_names"
-  op: "Const"
+  name: "DW2"
+  op: "VariableV2"
   attr {
-    key: "dtype"
+    key: "_class"
     value {
-      type: DT_STRING
+      list {
+        s: "loc:@DW2"
+      }
     }
   }
   attr {
-    key: "value"
+    key: "_output_shapes"
     value {
-      tensor {
-        dtype: DT_STRING
-        tensor_shape {
+      list {
+        shape {
           dim {
-            size: 1
+            size: 2
+          }
+          dim {
+            size: 2
+          }
+          dim {
+            size: 6
+          }
+          dim {
+            size: 12
           }
         }
-        string_val: "conv2d_1/bias"
       }
     }
   }
-}
-node {
-  name: "save/RestoreV2_2/shape_and_slices"
-  op: "Const"
+  attr {
+    key: "container"
+    value {
+      s: ""
+    }
+  }
   attr {
     key: "dtype"
     value {
-      type: DT_STRING
+      type: DT_FLOAT
     }
   }
   attr {
-    key: "value"
+    key: "shape"
     value {
-      tensor {
-        dtype: DT_STRING
-        tensor_shape {
-          dim {
-            size: 1
-          }
+      shape {
+        dim {
+          size: 2
+        }
+        dim {
+          size: 2
+        }
+        dim {
+          size: 6
+        }
+        dim {
+          size: 12
         }
-        string_val: ""
       }
     }
   }
-}
-node {
-  name: "save/RestoreV2_2"
-  op: "RestoreV2"
-  input: "save/Const"
-  input: "save/RestoreV2_2/tensor_names"
-  input: "save/RestoreV2_2/shape_and_slices"
   attr {
-    key: "dtypes"
+    key: "shared_name"
     value {
-      list {
-        type: DT_FLOAT
-      }
+      s: ""
     }
   }
 }
 node {
-  name: "save/Assign_2"
+  name: "DW2/Assign"
   op: "Assign"
-  input: "conv2d_1/bias"
-  input: "save/RestoreV2_2"
+  input: "DW2"
+  input: "DW2/Initializer/random_normal"
   attr {
     key: "T"
     value {
@@ -1450,7 +1218,28 @@ node {
     key: "_class"
     value {
       list {
-        s: "loc:@conv2d_1/bias"
+        s: "loc:@DW2"
+      }
+    }
+  }
+  attr {
+    key: "_output_shapes"
+    value {
+      list {
+        shape {
+          dim {
+            size: 2
+          }
+          dim {
+            size: 2
+          }
+          dim {
+            size: 6
+          }
+          dim {
+            size: 12
+          }
+        }
       }
     }
   }
@@ -1468,116 +1257,114 @@ node {
   }
 }
 node {
-  name: "save/RestoreV2_3/tensor_names"
-  op: "Const"
+  name: "DW2/read"
+  op: "Identity"
+  input: "DW2"
   attr {
-    key: "dtype"
+    key: "T"
     value {
-      type: DT_STRING
+      type: DT_FLOAT
     }
   }
   attr {
-    key: "value"
+    key: "_class"
     value {
-      tensor {
-        dtype: DT_STRING
-        tensor_shape {
+      list {
+        s: "loc:@DW2"
+      }
+    }
+  }
+  attr {
+    key: "_output_shapes"
+    value {
+      list {
+        shape {
+          dim {
+            size: 2
+          }
+          dim {
+            size: 2
+          }
           dim {
-            size: 1
+            size: 6
+          }
+          dim {
+            size: 12
           }
         }
-        string_val: "conv2d_1/kernel"
       }
     }
   }
 }
 node {
-  name: "save/RestoreV2_3/shape_and_slices"
-  op: "Const"
+  name: "Conv2D_1"
+  op: "Conv2D"
+  input: "Conv2D"
+  input: "DW2/read"
   attr {
-    key: "dtype"
+    key: "T"
     value {
-      type: DT_STRING
+      type: DT_FLOAT
     }
   }
   attr {
-    key: "value"
+    key: "_output_shapes"
     value {
-      tensor {
-        dtype: DT_STRING
-        tensor_shape {
+      list {
+        shape {
           dim {
-            size: 1
+            size: 2
+          }
+          dim {
+            size: 2
+          }
+          dim {
+            size: 2
+          }
+          dim {
+            size: 12
           }
         }
-        string_val: ""
       }
     }
   }
-}
-node {
-  name: "save/RestoreV2_3"
-  op: "RestoreV2"
-  input: "save/Const"
-  input: "save/RestoreV2_3/tensor_names"
-  input: "save/RestoreV2_3/shape_and_slices"
   attr {
-    key: "dtypes"
+    key: "data_format"
     value {
-      list {
-        type: DT_FLOAT
-      }
+      s: "NHWC"
     }
   }
-}
-node {
-  name: "save/Assign_3"
-  op: "Assign"
-  input: "conv2d_1/kernel"
-  input: "save/RestoreV2_3"
   attr {
-    key: "T"
+    key: "padding"
     value {
-      type: DT_FLOAT
+      s: "SAME"
     }
   }
   attr {
-    key: "_class"
+    key: "strides"
     value {
       list {
-        s: "loc:@conv2d_1/kernel"
+        i: 1
+        i: 2
+        i: 2
+        i: 1
       }
     }
   }
   attr {
-    key: "use_locking"
-    value {
-      b: true
-    }
-  }
-  attr {
-    key: "validate_shape"
+    key: "use_cudnn_on_gpu"
     value {
       b: true
     }
   }
 }
 node {
-  name: "save/restore_all"
-  op: "NoOp"
-  input: "^save/Assign"
-  input: "^save/Assign_1"
-  input: "^save/Assign_2"
-  input: "^save/Assign_3"
-}
-node {
   name: "init"
   op: "NoOp"
-  input: "^conv2d/kernel/Assign"
-  input: "^conv2d/bias/Assign"
-  input: "^conv2d_1/kernel/Assign"
-  input: "^conv2d_1/bias/Assign"
+  input: "^ScalarW/Assign"
+  input: "^DW/Assign"
+  input: "^DW2/Assign"
 }
 versions {
-  producer: 21
+  producer: 24
 }
diff --git a/tensorflow/core/profiler/internal/testdata/run_meta b/tensorflow/core/profiler/internal/testdata/run_meta
index 6e9e0c3872..ae76acb743 100644
--- a/tensorflow/core/profiler/internal/testdata/run_meta
+++ b/tensorflow/core/profiler/internal/testdata/run_meta
diff --git a/tensorflow/core/profiler/internal/testdata/tfprof_log b/tensorflow/core/profiler/internal/testdata/tfprof_log
index 2a317207c4..e1c3693d2b 100644
--- a/tensorflow/core/profiler/internal/testdata/tfprof_log
+++ b/tensorflow/core/profiler/internal/testdata/tfprof_log
@@ -1,17 +1,11 @@
 
-
-conv2d_2/BiasAdd�
-
-conv2d/BiasAdd�
-%
-
-conv2d_1/bias_trainable_variables
-
-conv2d_2/convolution�p
-
-conv2d/convolution�
-#
-conv2d/bias_trainable_variables
-'
-conv2d_1/kernel_trainable_variables
-%
-
-conv2d/kernel_trainable_variables
-\ No newline at end of file
+
+DW2_trainable_variables
+
+ScalarW_trainable_variables
+
+DW_trainable_variables
+
+Conv2D�-
+
+Conv2D_1�$
+\ No newline at end of file
diff --git a/tensorflow/core/profiler/internal/tfprof_code.cc b/tensorflow/core/profiler/internal/tfprof_code.cc
index 17c51bed9f..1c512a7ca1 100644
--- a/tensorflow/core/profiler/internal/tfprof_code.cc
+++ b/tensorflow/core/profiler/internal/tfprof_code.cc
@@ -191,6 +191,13 @@ class Samples {
         } else if (type == kShown[0]) {
           sample_pb->mutable_value()->Add(
               gn->requested_bytes(node->node->step()));
+        } else if (type == kShown[11]) {
+          sample_pb->mutable_value()->Add(gn->peak_bytes(node->node->step()));
+        } else if (type == kShown[12]) {
+          sample_pb->mutable_value()->Add(
+              gn->residual_bytes(node->node->step()));
+        } else if (type == kShown[13]) {
+          sample_pb->mutable_value()->Add(gn->output_bytes(node->node->step()));
         } else if (type == kShown[2]) {
           sample_pb->mutable_value()->Add(gn->parameters());
         } else if (type == kShown[3]) {
@@ -296,9 +303,21 @@ class PprofProfileImpl : public PprofProfile {
             string_table_.GetIndex("CPU execution time."));
       }
     } else if (type == kShown[0]) {
-      sample_type->set_unit(string_table_.GetIndex("bytes"));
+      sample_type->set_unit(string_table_.GetIndex("requested bytes"));
+      profile_pb->mutable_comment()->Add(
+          string_table_.GetIndex("Sum of operation total requested memory."));
+    } else if (type == kShown[11]) {
+      sample_type->set_unit(string_table_.GetIndex("peak bytes"));
+      profile_pb->mutable_comment()->Add(
+          string_table_.GetIndex("Sum of operation peak memory usage."));
+    } else if (type == kShown[12]) {
+      sample_type->set_unit(string_table_.GetIndex("residual bytes"));
+      profile_pb->mutable_comment()->Add(string_table_.GetIndex(
+          "Sum of operation allocated memory after finish."));
+    } else if (type == kShown[13]) {
+      sample_type->set_unit(string_table_.GetIndex("output bytes"));
       profile_pb->mutable_comment()->Add(
-          string_table_.GetIndex("Sum of operation output memory."));
+          string_table_.GetIndex("Sum of operation output size."));
     } else if (type == kShown[2]) {
       sample_type->set_unit(string_table_.GetIndex("count"));
       profile_pb->mutable_comment()->Add(
@@ -370,7 +389,8 @@ const ShowMultiNode* TFCode::ShowInternal(const Options& opts,
     }
     string select = *opts.select.begin();
     if (select != kShown[0] && select != kShown[1] && select != kShown[2] &&
-        select != kShown[3] && select != kShown[9] && select != kShown[10]) {
+        select != kShown[3] && select != kShown[9] && select != kShown[10] &&
+        select != kShown[11] && select != kShown[12] && select != kShown[13]) {
       fprintf(stderr, "pprof doesn't support -select=%s\n", select.c_str());
       return root_.get();
     }
@@ -522,17 +542,37 @@ std::vector<CodeNode*> TFCode::Account(const std::vector<CodeNode*>& roots,
   return act_nodes;
 }
 
-string TFCode::FormatNode(CodeNode* node, const Options& opts, int64 indent) {
+string TFCode::FormatNodeMemory(CodeNode* node, int64 bytes,
+                                int64 total_bytes) const {
+  string memory = FormatMemory(total_bytes);
+  if (node->account) {
+    memory = FormatMemory(bytes) + "/" + memory;
+  } else {
+    memory = "--/" + memory;
+  }
+  return memory;
+}
+
+string TFCode::FormatNode(CodeNode* node, const Options& opts,
+                          int64 indent) const {
   std::vector<string> attrs;
   if (opts.select.find(kShown[0]) != opts.select.end()) {
-    string memory = FormatMemory(node->proto().total_requested_bytes());
-    if (node->account) {
-      memory = FormatMemory(node->proto().requested_bytes()) + "/" + memory;
-    } else {
-      memory = "--/" + memory;
-    }
-    attrs.push_back(memory);
+    attrs.push_back(FormatNodeMemory(node, node->proto().requested_bytes(),
+                                     node->proto().total_requested_bytes()));
+  }
+  if (opts.select.find(kShown[11]) != opts.select.end()) {
+    attrs.push_back(FormatNodeMemory(node, node->proto().peak_bytes(),
+                                     node->proto().total_peak_bytes()));
   }
+  if (opts.select.find(kShown[12]) != opts.select.end()) {
+    attrs.push_back(FormatNodeMemory(node, node->proto().residual_bytes(),
+                                     node->proto().total_residual_bytes()));
+  }
+  if (opts.select.find(kShown[13]) != opts.select.end()) {
+    attrs.push_back(FormatNodeMemory(node, node->proto().output_bytes(),
+                                     node->proto().total_output_bytes()));
+  }
+
   std::vector<string> time_attrs = FormatTimes(node, opts);
   attrs.insert(attrs.end(), time_attrs.begin(), time_attrs.end());
 
diff --git a/tensorflow/core/profiler/internal/tfprof_code.h b/tensorflow/core/profiler/internal/tfprof_code.h
index 7583a43a26..5e64104d9f 100644
--- a/tensorflow/core/profiler/internal/tfprof_code.h
+++ b/tensorflow/core/profiler/internal/tfprof_code.h
@@ -79,7 +79,8 @@ class TFCode : public TFMultiShow {
               const Options& opts, string* display_str,
               MultiGraphNodeProto* proto, std::vector<uint64>* call_ids);
 
-  string FormatNode(CodeNode* node, const Options& opts, int64 indent);
+  string FormatNode(CodeNode* node, const Options& opts, int64 indent) const;
+  string FormatNodeMemory(CodeNode* node, int64 bytes, int64 total_bytes) const;
 
   std::unique_ptr<CodeNode> root_;
   std::unique_ptr<TFMultiGraphNode> graph_root_;
diff --git a/tensorflow/core/profiler/internal/tfprof_node.cc b/tensorflow/core/profiler/internal/tfprof_node.cc
index 732576d29c..69198019cd 100644
--- a/tensorflow/core/profiler/internal/tfprof_node.cc
+++ b/tensorflow/core/profiler/internal/tfprof_node.cc
@@ -110,9 +110,11 @@ void ExecStep::AddMemoryStats(const string& dev,
       uint64 output_ptr =
           output.tensor_description().allocation_description().ptr();
       total_output_bytes += output_bytes;
-      output_bytes_[output.slot()] = std::make_pair(output_bytes, output_ptr);
+      output_memory_[output.slot()] = std::make_pair(output_bytes, output_ptr);
     }
   }
+  output_bytes_ = total_output_bytes;
+
   if (step_stat.has_memory_stats()) {
     host_temp_bytes_ += step_stat.memory_stats().host_temp_memory_size();
     host_persistent_bytes_ +=
@@ -122,7 +124,17 @@ void ExecStep::AddMemoryStats(const string& dev,
     accelerator_persistent_bytes_ +=
         step_stat.memory_stats().device_persistent_memory_size();
   }
-  requested_bytes_ = total_output_bytes;
+  int64 residual_bytes = 0;
+  int64 requested_bytes = 0;
+  int64 peak_bytes = 0;
+  for (const auto& mem : step_stat.memory()) {
+    residual_bytes += mem.live_bytes();
+    requested_bytes += mem.total_bytes();
+    peak_bytes += mem.peak_bytes();
+  }
+  requested_bytes_ = requested_bytes;
+  residual_bytes_ = residual_bytes;
+  peak_bytes_ = peak_bytes;
 }
 
 void TFGraphNode::AddStepStat(int64 step, const string& device,
diff --git a/tensorflow/core/profiler/internal/tfprof_node.h b/tensorflow/core/profiler/internal/tfprof_node.h
index 929ee3f50c..5ec3da12cf 100644
--- a/tensorflow/core/profiler/internal/tfprof_node.h
+++ b/tensorflow/core/profiler/internal/tfprof_node.h
@@ -51,6 +51,9 @@ class ExecStep {
         latest_end_micros_(0),
         mem_initiated_(false),
         requested_bytes_(0),
+        peak_bytes_(0),
+        residual_bytes_(0),
+        output_bytes_(0),
         host_temp_bytes_(0),
         host_persistent_bytes_(0),
         accelerator_temp_bytes_(0),
@@ -78,14 +81,17 @@ class ExecStep {
   int64 latest_end_micros() const { return latest_end_micros_; }
 
   int64 requested_bytes() const { return requested_bytes_; }
+  int64 peak_bytes() const { return peak_bytes_; }
+  int64 residual_bytes() const { return residual_bytes_; }
+  int64 output_bytes() const { return output_bytes_; }
   int64 accelerator_temp_bytes() const { return accelerator_temp_bytes_; }
   int64 host_temp_bytes() const { return host_temp_bytes_; }
   int64 accelerator_persistent_bytes() const {
     return accelerator_persistent_bytes_;
   }
   int64 host_persistent_bytes() const { return host_persistent_bytes_; }
-  const std::map<int64, std::pair<int64, uint64>>& output_bytes() const {
-    return output_bytes_;
+  const std::map<int64, std::pair<int64, uint64>>& output_memory() const {
+    return output_memory_;
   }
   int64 allocator_bytes_in_use() const { return allocator_bytes_in_use_; }
 
@@ -111,8 +117,14 @@ class ExecStep {
   std::set<string> devices_;
 
   bool mem_initiated_;
-  // Total output bytes requested by the op.
+  // Total bytes requested by the op.
   int64 requested_bytes_;
+  // Total bytes requested by the op and released before op end.
+  int64 peak_bytes_;
+  // Total bytes requested by the op and not released after op end.
+  int64 residual_bytes_;
+  // Total bytes output by the op (not necessarily requested by the op).
+  int64 output_bytes_;
   // Total temporary bytes allocated and released by the op.
   int64 host_temp_bytes_;
   // Total persistent bytes (e.g. variable) allocated by the op.
@@ -122,9 +134,27 @@ class ExecStep {
   // The total number of bytes currently allocated by the allocator if >0.
   int64 allocator_bytes_in_use_;
   // output_idx -> {output_bytes, memory_ptr}
-  std::map<int64, std::pair<int64, uint64>> output_bytes_;
+  std::map<int64, std::pair<int64, uint64>> output_memory_;
 };
 
+#define GRAPH_NODE_BYTES(type)                                \
+  do {                                                        \
+    if (execs_.empty()) {                                     \
+      return 0;                                               \
+    }                                                         \
+    if (step >= 0) {                                          \
+      auto exec = execs_.find(step);                          \
+      CHECK(exec != execs_.end()) << "unknown step " << step; \
+      return exec->second.type##_bytes();                     \
+    }                                                         \
+                                                              \
+    int64 bytes = 0;                                          \
+    for (const auto& exec : execs_) {                         \
+      bytes += exec.second.type##_bytes();                    \
+    }                                                         \
+    return bytes / execs_.size();                             \
+  } while (0)
+
 class TFGraphNode {
  public:
   TFGraphNode(const NodeDef* node)
@@ -270,22 +300,10 @@ class TFGraphNode {
     return total_micros / execs_.size();
   }
 
-  int64 requested_bytes(int64 step) const {
-    if (execs_.empty()) {
-      return 0;
-    }
-    if (step >= 0) {
-      auto exec = execs_.find(step);
-      CHECK(exec != execs_.end()) << "unknown step " << step;
-      return exec->second.requested_bytes();
-    }
-
-    int64 requested_bytes = 0;
-    for (const auto& exec : execs_) {
-      requested_bytes += exec.second.requested_bytes();
-    }
-    return requested_bytes / execs_.size();
-  }
+  int64 requested_bytes(int64 step) const { GRAPH_NODE_BYTES(requested); }
+  int64 peak_bytes(int64 step) const { GRAPH_NODE_BYTES(peak); }
+  int64 residual_bytes(int64 step) const { GRAPH_NODE_BYTES(residual); }
+  int64 output_bytes(int64 step) const { GRAPH_NODE_BYTES(output); }
 
   int64 all_start_micros(int64 step) const {
     auto exec = execs_.find(step);
@@ -328,11 +346,11 @@ class TFGraphNode {
     CHECK(exec != execs_.end()) << "unknown step " << step;
     return exec->second.host_persistent_bytes();
   }
-  const std::map<int64, std::pair<int64, uint64>>& output_bytes(
+  const std::map<int64, std::pair<int64, uint64>>& output_memory(
       int64 step) const {
     auto exec = execs_.find(step);
     CHECK(exec != execs_.end()) << "unknown step " << step;
-    return exec->second.output_bytes();
+    return exec->second.output_memory();
   }
   int64 allocator_bytes_in_use(int64 step) const {
     auto exec = execs_.find(step);
@@ -427,6 +445,9 @@ class TFMultiGraphNode {
         accelerator_exec_micros_(0),
         cpu_exec_micros_(0),
         requested_bytes_(0),
+        peak_bytes_(0),
+        residual_bytes_(0),
+        output_bytes_(0),
         float_ops_(0),
         parameters_(0) {}
 
@@ -437,6 +458,10 @@ class TFMultiGraphNode {
     cpu_exec_micros_ = 0;
 
     requested_bytes_ = 0;
+    peak_bytes_ = 0;
+    residual_bytes_ = 0;
+    output_bytes_ = 0;
+
     float_ops_ = 0;
     parameters_ = 0;
     op_types_.clear();
@@ -460,6 +485,10 @@ class TFMultiGraphNode {
       cpu_exec_micros_ += node->cpu_exec_micros(step);
 
       requested_bytes_ += node->requested_bytes(step);
+      peak_bytes_ += node->peak_bytes(step);
+      residual_bytes_ += node->residual_bytes(step);
+      output_bytes_ += node->output_bytes(step);
+
       float_ops_ += node->float_ops(step);
       parameters_ += node->parameters();
       if (node->shape().size() > 0) {
@@ -492,6 +521,9 @@ class TFMultiGraphNode {
   int64 cpu_exec_micros() const { return cpu_exec_micros_; }
 
   int64 requested_bytes() const { return requested_bytes_; }
+  int64 peak_bytes() const { return peak_bytes_; }
+  int64 residual_bytes() const { return residual_bytes_; }
+  int64 output_bytes() const { return output_bytes_; }
 
   int64 float_ops() const { return float_ops_; }
 
@@ -540,6 +572,9 @@ class TFMultiGraphNode {
   int64 cpu_exec_micros_;
 
   int64 requested_bytes_;
+  int64 peak_bytes_;
+  int64 residual_bytes_;
+  int64 output_bytes_;
   int64 float_ops_;
   int64 parameters_;
   std::set<string> devices_;
diff --git a/tensorflow/core/profiler/internal/tfprof_node_show.cc b/tensorflow/core/profiler/internal/tfprof_node_show.cc
index 16b94fdfa1..b0f8dcbf3b 100644
--- a/tensorflow/core/profiler/internal/tfprof_node_show.cc
+++ b/tensorflow/core/profiler/internal/tfprof_node_show.cc
@@ -38,6 +38,10 @@ void ShowNode::ReInit(int64 step) {
   mutable_proto()->set_cpu_exec_micros(node->cpu_exec_micros(step));
 
   mutable_proto()->set_requested_bytes(node->requested_bytes(step));
+  mutable_proto()->set_peak_bytes(node->peak_bytes(step));
+  mutable_proto()->set_residual_bytes(node->residual_bytes(step));
+  mutable_proto()->set_output_bytes(node->output_bytes(step));
+
   mutable_proto()->set_float_ops(node->float_ops(step));
 
   mutable_proto()->clear_input_shapes();
@@ -68,6 +72,12 @@ void ShowNode::AggregateTotalStats(ShowNode* node) {
 
   mutable_proto()->set_total_requested_bytes(proto().total_requested_bytes() +
                                              node_pb->total_requested_bytes());
+  mutable_proto()->set_total_peak_bytes(proto().total_peak_bytes() +
+                                        node_pb->total_peak_bytes());
+  mutable_proto()->set_total_residual_bytes(proto().total_residual_bytes() +
+                                            node_pb->total_residual_bytes());
+  mutable_proto()->set_total_output_bytes(proto().total_output_bytes() +
+                                          node_pb->total_output_bytes());
   mutable_proto()->set_total_parameters(proto().total_parameters() +
                                         node_pb->total_parameters());
   mutable_proto()->set_total_float_ops(proto().total_float_ops() +
@@ -89,6 +99,13 @@ void ShowNode::AddSelfToTotalStats() {
 
   mutable_proto()->set_total_requested_bytes(proto().total_requested_bytes() +
                                              proto().requested_bytes());
+  mutable_proto()->set_total_peak_bytes(proto().total_peak_bytes() +
+                                        proto().peak_bytes());
+  mutable_proto()->set_total_residual_bytes(proto().total_residual_bytes() +
+                                            proto().residual_bytes());
+  mutable_proto()->set_total_output_bytes(proto().total_output_bytes() +
+                                          proto().output_bytes());
+
   mutable_proto()->set_total_parameters(proto().total_parameters() +
                                         proto().parameters());
   mutable_proto()->set_total_float_ops(proto().total_float_ops() +
@@ -105,6 +122,10 @@ void ShowNode::ResetTotalStats() {
   mutable_proto()->set_total_cpu_exec_micros(0);
 
   mutable_proto()->set_total_requested_bytes(0);
+  mutable_proto()->set_total_peak_bytes(0);
+  mutable_proto()->set_total_residual_bytes(0);
+  mutable_proto()->set_total_output_bytes(0);
+
   mutable_proto()->set_total_parameters(0);
   mutable_proto()->set_total_float_ops(0);
   mutable_proto()->mutable_children()->Clear();
@@ -135,6 +156,10 @@ bool ShowMultiNode::ReInit(int64 step,
   mutable_proto()->set_cpu_exec_micros(node->cpu_exec_micros());
 
   mutable_proto()->set_requested_bytes(node->requested_bytes());
+  mutable_proto()->set_peak_bytes(node->peak_bytes());
+  mutable_proto()->set_residual_bytes(node->residual_bytes());
+  mutable_proto()->set_output_bytes(node->output_bytes());
+
   mutable_proto()->set_float_ops(node->float_ops());
 
   mutable_proto()->set_parameters(node->parameters());
@@ -157,6 +182,13 @@ void ShowMultiNode::AggregateTotalStats(ShowMultiNode* node) {
 
   mutable_proto()->set_total_requested_bytes(proto().total_requested_bytes() +
                                              node_pb->total_requested_bytes());
+  mutable_proto()->set_total_peak_bytes(proto().total_peak_bytes() +
+                                        node_pb->total_peak_bytes());
+  mutable_proto()->set_total_residual_bytes(proto().total_residual_bytes() +
+                                            node_pb->total_residual_bytes());
+  mutable_proto()->set_total_output_bytes(proto().total_output_bytes() +
+                                          node_pb->total_output_bytes());
+
   mutable_proto()->set_total_parameters(proto().total_parameters() +
                                         node_pb->total_parameters());
   mutable_proto()->set_total_float_ops(proto().total_float_ops() +
@@ -174,6 +206,13 @@ void ShowMultiNode::AddSelfToTotalStats() {
 
   mutable_proto()->set_total_requested_bytes(proto().total_requested_bytes() +
                                              proto().requested_bytes());
+  mutable_proto()->set_total_peak_bytes(proto().total_peak_bytes() +
+                                        proto().peak_bytes());
+  mutable_proto()->set_total_residual_bytes(proto().total_residual_bytes() +
+                                            proto().residual_bytes());
+  mutable_proto()->set_total_output_bytes(proto().total_output_bytes() +
+                                          proto().output_bytes());
+
   mutable_proto()->set_total_parameters(proto().total_parameters() +
                                         proto().parameters());
   mutable_proto()->set_total_float_ops(proto().total_float_ops() +
@@ -187,6 +226,10 @@ void ShowMultiNode::ResetTotalStats() {
   mutable_proto()->set_total_cpu_exec_micros(0);
 
   mutable_proto()->set_total_requested_bytes(0);
+  mutable_proto()->set_total_peak_bytes(0);
+  mutable_proto()->set_total_residual_bytes(0);
+  mutable_proto()->set_total_output_bytes(0);
+
   mutable_proto()->set_total_parameters(0);
   mutable_proto()->set_total_float_ops(0);
   mutable_proto()->mutable_children()->Clear();
diff --git a/tensorflow/core/profiler/internal/tfprof_op.cc b/tensorflow/core/profiler/internal/tfprof_op.cc
index ab013506ec..c04b0ea0c6 100644
--- a/tensorflow/core/profiler/internal/tfprof_op.cc
+++ b/tensorflow/core/profiler/internal/tfprof_op.cc
@@ -211,24 +211,44 @@ int64 TFOp::SearchRoot(const std::vector<OpNode*> nodes,
   return i;
 }
 
+string TFOp::FormatMemoryNode(int64 node_total_bytes, int64 root_total_bytes,
+                              int64 node_bytes) const {
+  double accu_pct = 0.0;
+  double pct = 0.0;
+  if (node_bytes > 0) {
+    accu_pct = 100.0 * node_total_bytes / root_total_bytes;
+    pct = 100.0 * node_bytes / root_total_bytes;
+  }
+  return strings::Printf(
+      "%30s", strings::Printf("%s (%.2f%%, %.2f%%)",
+                              FormatMemory(node_bytes).c_str(), accu_pct, pct)
+                  .c_str());
+}
+
 string TFOp::FormatNode(OpNode* node, OpNode* root, const Options& opts) const {
   std::vector<string> attrs;
 
   if (opts.select.find(kShown[0]) != opts.select.end()) {
-    double accu_pct = 0.0;
-    double pct = 0.0;
-    if (node->proto().requested_bytes() > 0) {
-      accu_pct = 100.0 * node->proto().total_requested_bytes() /
-          root->proto().total_requested_bytes();
-      pct = 100.0 * node->proto().requested_bytes() /
-          root->proto().total_requested_bytes();
-    }
-    attrs.push_back(strings::Printf(
-        "%30s",
-        strings::Printf("%s (%.2f%%, %.2f%%)",
-                        FormatMemory(node->proto().requested_bytes()).c_str(),
-                        accu_pct, pct)
-            .c_str()));
+    attrs.push_back(FormatMemoryNode(node->proto().total_requested_bytes(),
+                                     root->proto().total_requested_bytes(),
+                                     node->proto().requested_bytes()));
+  }
+
+  if (opts.select.find(kShown[11]) != opts.select.end()) {
+    attrs.push_back(FormatMemoryNode(node->proto().total_peak_bytes(),
+                                     root->proto().total_peak_bytes(),
+                                     node->proto().peak_bytes()));
+  }
+
+  if (opts.select.find(kShown[12]) != opts.select.end()) {
+    attrs.push_back(FormatMemoryNode(node->proto().total_residual_bytes(),
+                                     root->proto().total_residual_bytes(),
+                                     node->proto().residual_bytes()));
+  }
+  if (opts.select.find(kShown[13]) != opts.select.end()) {
+    attrs.push_back(FormatMemoryNode(node->proto().total_output_bytes(),
+                                     root->proto().total_output_bytes(),
+                                     node->proto().output_bytes()));
   }
 
   if (opts.select.find(kShown[1]) != opts.select.end()) {
diff --git a/tensorflow/core/profiler/internal/tfprof_op.h b/tensorflow/core/profiler/internal/tfprof_op.h
index 9e20f5c3f4..55a346c7e8 100644
--- a/tensorflow/core/profiler/internal/tfprof_op.h
+++ b/tensorflow/core/profiler/internal/tfprof_op.h
@@ -65,6 +65,8 @@ class TFOp : public TFMultiShow {
   }
 
   string FormatNode(OpNode* node, OpNode* root, const Options& opts) const;
+  string FormatMemoryNode(int64 node_total_bytes, int64 root_total_bytes,
+                          int64 node_bytes) const;
 
   std::unique_ptr<OpNode> root_;
   std::map<string, std::unique_ptr<OpNode>> cnodes_map_;
diff --git a/tensorflow/core/profiler/internal/tfprof_options.cc b/tensorflow/core/profiler/internal/tfprof_options.cc
index 2b5e340cec..6634272541 100644
--- a/tensorflow/core/profiler/internal/tfprof_options.cc
+++ b/tensorflow/core/profiler/internal/tfprof_options.cc
@@ -151,9 +151,11 @@ tensorflow::Status Options::FromProtoStr(const string& opts_proto_str,
   }
 
   *opts = Options(
-      opts_pb.max_depth(), opts_pb.min_bytes(), opts_pb.min_micros(),
-      opts_pb.min_params(), opts_pb.min_float_ops(), opts_pb.min_occurrence(),
-      opts_pb.step(), opts_pb.order_by(),
+      opts_pb.max_depth(), opts_pb.min_bytes(), opts_pb.min_peak_bytes(),
+      opts_pb.min_residual_bytes(), opts_pb.min_output_bytes(),
+      opts_pb.min_micros(), opts_pb.min_accelerator_micros(),
+      opts_pb.min_cpu_micros(), opts_pb.min_params(), opts_pb.min_float_ops(),
+      opts_pb.min_occurrence(), opts_pb.step(), opts_pb.order_by(),
       std::vector<string>(opts_pb.account_type_regexes().begin(),
                           opts_pb.account_type_regexes().end()),
       std::vector<string>(opts_pb.start_name_regexes().begin(),
@@ -179,6 +181,11 @@ string Options::ToString() const {
       "%-28s%lld\n"
       "%-28s%lld\n"
       "%-28s%lld\n"
+      "%-28s%lld\n"
+      "%-28s%lld\n"
+      "%-28s%lld\n"
+      "%-28s%lld\n"
+      "%-28s%lld\n"
       "%-28s%s\n"
       "%-28s%s\n"
       "%-28s%s\n"
@@ -188,17 +195,20 @@ string Options::ToString() const {
       "%-28s%s\n"
       "%-28s%s\n"
       "%-28s%s:%s\n",
-      kOptions[0], max_depth, kOptions[1], min_bytes, kOptions[2], min_micros,
-      kOptions[3], min_params, kOptions[4], min_float_ops, kOptions[5],
-      min_occurrence, kOptions[6], step, kOptions[7], order_by.c_str(),
-      kOptions[8], str_util::Join(account_type_regexes, ",").c_str(),
-      kOptions[9], str_util::Join(start_name_regexes, ",").c_str(),
-      kOptions[10], str_util::Join(trim_name_regexes, ",").c_str(),
-      kOptions[11], str_util::Join(show_name_regexes, ",").c_str(),
-      kOptions[12], str_util::Join(hide_name_regexes, ",").c_str(),
-      kOptions[13], (account_displayed_op_only ? "true" : "false"),
-      kOptions[14], str_util::Join(select, ",").c_str(), kOptions[15],
-      output_type.c_str(), KeyValueToStr(output_options).c_str());
+      kOptions[0], max_depth, kOptions[1], min_bytes, kOptions[2],
+      min_peak_bytes, kOptions[3], min_residual_bytes, kOptions[4],
+      min_output_bytes, kOptions[5], min_micros, kOptions[6],
+      min_accelerator_micros, kOptions[7], min_cpu_micros, kOptions[8],
+      min_params, kOptions[9], min_float_ops, kOptions[10], min_occurrence,
+      kOptions[11], step, kOptions[12], order_by.c_str(), kOptions[13],
+      str_util::Join(account_type_regexes, ",").c_str(), kOptions[14],
+      str_util::Join(start_name_regexes, ",").c_str(), kOptions[15],
+      str_util::Join(trim_name_regexes, ",").c_str(), kOptions[16],
+      str_util::Join(show_name_regexes, ",").c_str(), kOptions[17],
+      str_util::Join(hide_name_regexes, ",").c_str(), kOptions[18],
+      (account_displayed_op_only ? "true" : "false"), kOptions[19],
+      str_util::Join(select, ",").c_str(), kOptions[20], output_type.c_str(),
+      KeyValueToStr(output_options).c_str());
   return s;
 }
 
diff --git a/tensorflow/core/profiler/internal/tfprof_options.h b/tensorflow/core/profiler/internal/tfprof_options.h
index 8e78ee7463..463f5b3c3a 100644
--- a/tensorflow/core/profiler/internal/tfprof_options.h
+++ b/tensorflow/core/profiler/internal/tfprof_options.h
@@ -29,7 +29,12 @@ namespace tfprof {
 static const char* const kOptions[] = {
     "-max_depth",
     "-min_bytes",
+    "-min_peak_bytes",
+    "-min_residual_bytes",
+    "-min_output_bytes",
     "-min_micros",
+    "-min_accelerator_micros",
+    "-min_cpu_micros",
     "-min_params",
     "-min_float_ops",
     "-min_occurrence",
@@ -46,17 +51,21 @@ static const char* const kOptions[] = {
 };
 
 static const char* const kOrderBy[] = {
-    "name",       "bytes",  "micros",    "accelerator_micros",
-    "cpu_micros", "params", "float_ops", "occurrence",
+    "name",         "bytes",     "peak_bytes",         "residual_bytes",
+    "output_bytes", "micros",    "accelerator_micros", "cpu_micros",
+    "params",       "float_ops", "occurrence",
 };
 
 // Append Only.
 // TODO(xpan): As we are adding more fields to be selected, we
 // need to have a way to tell users what fields are available in which view.
-static const char* const kShown[] = {
-    "bytes",     "micros",   "params",     "float_ops",    "tensor_value",
-    "device",    "op_types", "occurrence", "input_shapes", "accelerator_micros",
-    "cpu_micros"};
+static const char* const kShown[] = {"bytes",          "micros",
+                                     "params",         "float_ops",
+                                     "tensor_value",   "device",
+                                     "op_types",       "occurrence",
+                                     "input_shapes",   "accelerator_micros",
+                                     "cpu_micros",     "peak_bytes",
+                                     "residual_bytes", "output_bytes"};
 
 static const char* const kCmds[] = {
     "scope", "graph", "code", "op", "advise", "set", "help",
@@ -94,11 +103,15 @@ struct Options {
 
   virtual ~Options() {}
   Options()
-      : Options(0, 0, 0, 0, 0, 0, 0, "", {}, {}, {}, {}, {}, false, {}, "",
-                {}) {}
+      : Options(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, "", {}, {}, {}, {}, {},
+                false, {}, "", {}) {}
 
   Options(int max_depth, tensorflow::int64 min_bytes,
-          tensorflow::int64 min_micros, tensorflow::int64 min_params,
+          tensorflow::int64 min_peak_bytes,
+          tensorflow::int64 min_residual_bytes,
+          tensorflow::int64 min_output_bytes, tensorflow::int64 min_micros,
+          tensorflow::int64 min_accelerator_micros,
+          tensorflow::int64 min_cpu_micros, tensorflow::int64 min_params,
           tensorflow::int64 min_float_ops, tensorflow::int64 min_occurrence,
           tensorflow::int64 step, const string& order_by,
           const std::vector<string>& account_type_regexes,
@@ -111,7 +124,12 @@ struct Options {
           const std::map<string, string>& output_options)
       : max_depth(max_depth),
         min_bytes(min_bytes),
+        min_peak_bytes(min_peak_bytes),
+        min_residual_bytes(min_residual_bytes),
+        min_output_bytes(min_output_bytes),
         min_micros(min_micros),
+        min_accelerator_micros(min_accelerator_micros),
+        min_cpu_micros(min_cpu_micros),
         min_params(min_params),
         min_float_ops(min_float_ops),
         min_occurrence(min_occurrence),
@@ -131,7 +149,12 @@ struct Options {
 
   int max_depth;
   tensorflow::int64 min_bytes;
+  tensorflow::int64 min_peak_bytes;
+  tensorflow::int64 min_residual_bytes;
+  tensorflow::int64 min_output_bytes;
   tensorflow::int64 min_micros;
+  tensorflow::int64 min_accelerator_micros;
+  tensorflow::int64 min_cpu_micros;
   tensorflow::int64 min_params;
   tensorflow::int64 min_float_ops;
   tensorflow::int64 min_occurrence;
diff --git a/tensorflow/core/profiler/internal/tfprof_show.cc b/tensorflow/core/profiler/internal/tfprof_show.cc
index 630eba4ff2..cf28876089 100644
--- a/tensorflow/core/profiler/internal/tfprof_show.cc
+++ b/tensorflow/core/profiler/internal/tfprof_show.cc
@@ -73,8 +73,14 @@ bool TFShow::ShouldShow(const ShowNode* node, const Options& opts,
   // Always show kTFProfRoot.
   if (node->name() == kTFProfRoot) return true;
 
-  if (node->proto().requested_bytes() < opts.min_bytes ||
-      node->proto().exec_micros() < opts.min_micros ||
+  if (node->proto().total_requested_bytes() < opts.min_bytes ||
+      node->proto().total_peak_bytes() < opts.min_peak_bytes ||
+      node->proto().total_residual_bytes() < opts.min_residual_bytes ||
+      node->proto().total_output_bytes() < opts.min_output_bytes ||
+      node->proto().total_exec_micros() < opts.min_micros ||
+      node->proto().total_accelerator_exec_micros() <
+          opts.min_accelerator_micros ||
+      node->proto().total_cpu_exec_micros() < opts.min_cpu_micros ||
       node->proto().parameters() < opts.min_params ||
       node->proto().float_ops() < opts.min_float_ops ||
       node->proto().run_count() < opts.min_occurrence ||
@@ -128,6 +134,17 @@ bool TFShow::ReAccount(ShowNode* node, const Options& opts) {
   return false;
 }
 
+string TFShow::FormatNodeMemory(ShowNode* node, int64 bytes,
+                                int64 total_bytes) const {
+  string memory = FormatMemory(total_bytes);
+  if (node->account) {
+    memory = FormatMemory(bytes) + "/" + memory;
+  } else {
+    memory = "--/" + memory;
+  }
+  return memory;
+}
+
 string TFShow::FormatNode(ShowNode* node, const Options& opts) const {
   std::vector<string> info;
   if (opts.select.find(kShown[2]) != opts.select.end()) {
@@ -152,15 +169,22 @@ string TFShow::FormatNode(ShowNode* node, const Options& opts) const {
     }
     info.push_back(fops);
   }
+  std::vector<string> attrs;
   if (opts.select.find(kShown[0]) != opts.select.end()) {
-    string memory = FormatMemory(node->proto().total_requested_bytes());
-    if (node->account) {
-      memory = FormatMemory(node->proto().requested_bytes()) + "/" + memory;
-
-    } else {
-      memory = "--/" + memory;
-    }
-    info.push_back(memory);
+    info.push_back(FormatNodeMemory(node, node->proto().requested_bytes(),
+                                    node->proto().total_requested_bytes()));
+  }
+  if (opts.select.find(kShown[11]) != opts.select.end()) {
+    info.push_back(FormatNodeMemory(node, node->proto().peak_bytes(),
+                                    node->proto().total_peak_bytes()));
+  }
+  if (opts.select.find(kShown[12]) != opts.select.end()) {
+    info.push_back(FormatNodeMemory(node, node->proto().residual_bytes(),
+                                    node->proto().total_residual_bytes()));
+  }
+  if (opts.select.find(kShown[13]) != opts.select.end()) {
+    info.push_back(FormatNodeMemory(node, node->proto().output_bytes(),
+                                    node->proto().total_output_bytes()));
   }
   if (opts.select.find(kShown[1]) != opts.select.end()) {
     info.push_back(FormatTotalExecTime(node, opts));
@@ -225,6 +249,15 @@ string TFShow::FormatLegend(const Options& opts) const {
     legends.push_back("# float_ops");
   }
   if (opts.select.find(kShown[0]) != opts.select.end()) {
+    legends.push_back("requested bytes");
+  }
+  if (opts.select.find(kShown[11]) != opts.select.end()) {
+    legends.push_back("peak bytes");
+  }
+  if (opts.select.find(kShown[12]) != opts.select.end()) {
+    legends.push_back("residual bytes");
+  }
+  if (opts.select.find(kShown[13]) != opts.select.end()) {
     legends.push_back("output bytes");
   }
   if (opts.select.find(kShown[1]) != opts.select.end()) {
diff --git a/tensorflow/core/profiler/internal/tfprof_show.h b/tensorflow/core/profiler/internal/tfprof_show.h
index 2f7e0e6211..08c231bad7 100644
--- a/tensorflow/core/profiler/internal/tfprof_show.h
+++ b/tensorflow/core/profiler/internal/tfprof_show.h
@@ -67,6 +67,7 @@ class TFShow {
   bool ReAccount(ShowNode* node, const Options& opts);
 
   string FormatNode(ShowNode* node, const Options& opts) const;
+  string FormatNodeMemory(ShowNode* node, int64 bytes, int64 total_bytes) const;
 
   string FormatLegend(const Options& opts) const;
 
@@ -87,17 +88,25 @@ class TFShow {
         return n1->proto().total_requested_bytes() >
                n2->proto().total_requested_bytes();
       } else if (opts.order_by == kOrderBy[2]) {
+        return n1->proto().total_peak_bytes() > n2->proto().total_peak_bytes();
+      } else if (opts.order_by == kOrderBy[3]) {
+        return n1->proto().total_residual_bytes() >
+               n2->proto().total_residual_bytes();
+      } else if (opts.order_by == kOrderBy[4]) {
+        return n1->proto().total_output_bytes() >
+               n2->proto().total_output_bytes();
+      } else if (opts.order_by == kOrderBy[5]) {
         return n1->proto().total_exec_micros() >
                n2->proto().total_exec_micros();
-      } else if (opts.order_by == kOrderBy[3]) {
+      } else if (opts.order_by == kOrderBy[6]) {
         return n1->proto().total_accelerator_exec_micros() >
                n2->proto().total_accelerator_exec_micros();
-      } else if (opts.order_by == kOrderBy[4]) {
+      } else if (opts.order_by == kOrderBy[7]) {
         return n1->proto().total_cpu_exec_micros() >
                n2->proto().total_cpu_exec_micros();
-      } else if (opts.order_by == kOrderBy[5]) {
+      } else if (opts.order_by == kOrderBy[8]) {
         return n1->proto().total_parameters() > n2->proto().total_parameters();
-      } else if (opts.order_by == kOrderBy[6]) {
+      } else if (opts.order_by == kOrderBy[9]) {
         return n1->proto().total_float_ops() > n2->proto().total_float_ops();
       }
       return name_cmp;
diff --git a/tensorflow/core/profiler/internal/tfprof_show_multi.cc b/tensorflow/core/profiler/internal/tfprof_show_multi.cc
index 34b3e9e3f0..eb826a7137 100644
--- a/tensorflow/core/profiler/internal/tfprof_show_multi.cc
+++ b/tensorflow/core/profiler/internal/tfprof_show_multi.cc
@@ -65,7 +65,13 @@ bool TFMultiShow::ShouldShow(const ShowMultiNode* node, const Options& opts,
   // want to see the middle code traces (i.e. their own codes.), instead
   // of the TensorFlow internal codes traces.
   if (node->proto().total_requested_bytes() < opts.min_bytes ||
+      node->proto().total_peak_bytes() < opts.min_peak_bytes ||
+      node->proto().total_residual_bytes() < opts.min_residual_bytes ||
+      node->proto().total_output_bytes() < opts.min_output_bytes ||
       node->proto().total_exec_micros() < opts.min_micros ||
+      node->proto().total_accelerator_exec_micros() <
+          opts.min_accelerator_micros ||
+      node->proto().total_cpu_exec_micros() < opts.min_cpu_micros ||
       node->proto().total_parameters() < opts.min_params ||
       node->proto().total_float_ops() < opts.min_float_ops ||
       depth > opts.max_depth || !ShouldShowIfExtra(node, opts, depth)) {
@@ -109,6 +115,15 @@ bool TFMultiShow::ReAccount(ShowMultiNode* node, const Options& opts) {
 string TFMultiShow::FormatLegend(const Options& opts) const {
   std::vector<string> legends;
   if (opts.select.find(kShown[0]) != opts.select.end()) {
+    legends.push_back("requested bytes");
+  }
+  if (opts.select.find(kShown[11]) != opts.select.end()) {
+    legends.push_back("peak bytes");
+  }
+  if (opts.select.find(kShown[12]) != opts.select.end()) {
+    legends.push_back("residual bytes");
+  }
+  if (opts.select.find(kShown[13]) != opts.select.end()) {
     legends.push_back("output bytes");
   }
   if (opts.select.find(kShown[1]) != opts.select.end()) {
diff --git a/tensorflow/core/profiler/internal/tfprof_show_multi.h b/tensorflow/core/profiler/internal/tfprof_show_multi.h
index f731f6afbb..a632c66933 100644
--- a/tensorflow/core/profiler/internal/tfprof_show_multi.h
+++ b/tensorflow/core/profiler/internal/tfprof_show_multi.h
@@ -90,21 +90,30 @@ class TFMultiShow {
                   return n1->proto().total_requested_bytes() >
                          n2->proto().total_requested_bytes();
                 } else if (opts.order_by == kOrderBy[2]) {
+                  return n1->proto().total_peak_bytes() >
+                         n2->proto().total_peak_bytes();
+                } else if (opts.order_by == kOrderBy[3]) {
+                  return n1->proto().total_residual_bytes() >
+                         n2->proto().total_residual_bytes();
+                } else if (opts.order_by == kOrderBy[4]) {
+                  return n1->proto().total_output_bytes() >
+                         n2->proto().total_output_bytes();
+                } else if (opts.order_by == kOrderBy[5]) {
                   return n1->proto().total_exec_micros() >
                          n2->proto().total_exec_micros();
-                } else if (opts.order_by == kOrderBy[3]) {
+                } else if (opts.order_by == kOrderBy[6]) {
                   return n1->proto().total_accelerator_exec_micros() >
                          n2->proto().total_accelerator_exec_micros();
-                } else if (opts.order_by == kOrderBy[4]) {
+                } else if (opts.order_by == kOrderBy[7]) {
                   return n1->proto().total_cpu_exec_micros() >
                          n2->proto().total_cpu_exec_micros();
-                } else if (opts.order_by == kOrderBy[5]) {
+                } else if (opts.order_by == kOrderBy[8]) {
                   return n1->proto().total_parameters() >
                          n2->proto().total_parameters();
-                } else if (opts.order_by == kOrderBy[6]) {
+                } else if (opts.order_by == kOrderBy[9]) {
                   return n1->proto().total_float_ops() >
                          n2->proto().total_float_ops();
-                } else if (opts.order_by == kOrderBy[7]) {
+                } else if (opts.order_by == kOrderBy[10]) {
                   return n1->node->graph_nodes().size() >
                          n2->node->graph_nodes().size();
                 }
diff --git a/tensorflow/core/profiler/internal/tfprof_show_test.cc b/tensorflow/core/profiler/internal/tfprof_show_test.cc
index e2ba113e9b..f2c8b662d0 100644
--- a/tensorflow/core/profiler/internal/tfprof_show_test.cc
+++ b/tensorflow/core/profiler/internal/tfprof_show_test.cc
@@ -22,12 +22,12 @@ limitations under the License.
 #include "tensorflow/core/lib/io/path.h"
 #include "tensorflow/core/platform/env.h"
 #include "tensorflow/core/platform/test.h"
-#include "tensorflow/core/protobuf/config.pb.h"
 #include "tensorflow/core/profiler/internal/tfprof_constants.h"
 #include "tensorflow/core/profiler/internal/tfprof_options.h"
 #include "tensorflow/core/profiler/internal/tfprof_utils.h"
 #include "tensorflow/core/profiler/tfprof_log.pb.h"
 #include "tensorflow/core/profiler/tfprof_output.pb.h"
+#include "tensorflow/core/protobuf/config.pb.h"
 
 namespace tensorflow {
 namespace tfprof {
@@ -73,90 +73,79 @@ class TFProfShowTest : public ::testing::Test {
 
 TEST_F(TFProfShowTest, DumpScopeMode) {
   string dump_file = io::JoinPath(testing::TmpDir(), "dump");
-  Options opts(5, 0, 0, 0, 0, 0, -1, "name",
-               {"VariableV2"},  // accout_type_regexes
-               {".*"}, {""}, {".*"}, {""}, false,
-               {"params", "bytes", "micros", "float_ops"}, "file",
-               {{"outfile", dump_file}});
+  Options opts(
+      5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, "name",
+      {"VariableV2"},  // accout_type_regexes
+      {".*"}, {""}, {".*"}, {""}, false,
+      {"params", "bytes", "peak_bytes", "residual_bytes", "output_bytes",
+       "micros", "accelerator_micros", "cpu_micros", "float_ops"},
+      "file", {{"outfile", dump_file}});
   tf_stats_->ShowGraphNode("scope", opts);
 
   string dump_str;
   TF_CHECK_OK(ReadFileToString(Env::Default(), dump_file, &dump_str));
   EXPECT_EQ(
-      "node name | # parameters | # float_ops | output bytes | total execution "
-      "time | accelerator execution time | cpu execution time\n_TFProfRoot "
-      "(--/370 params, --/0 flops, --/1.48KB, --/5us, --/0us, --/5us)\n  "
-      "conv2d (--/140 params, --/0 flops, --/560B, --/2us, --/0us, --/2us)\n   "
-      " conv2d/bias (5, 5/5 params, 0/0 flops, 20B/20B, 1us/1us, 0us/0us, "
-      "1us/1us)\n    conv2d/kernel (3x3x3x5, 135/135 params, 0/0 flops, "
-      "540B/540B, 1us/1us, 0us/0us, 1us/1us)\n  conv2d_1 (--/230 params, --/0 "
-      "flops, --/920B, --/3us, --/0us, --/3us)\n    conv2d_1/bias (5, 5/5 "
-      "params, 0/0 flops, 20B/20B, 1us/1us, 0us/0us, 1us/1us)\n    "
-      "conv2d_1/kernel (3x3x5x5, 225/225 params, 0/0 flops, 900B/900B, "
-      "2us/2us, 0us/0us, 2us/2us)\n",
+      "node name | # parameters | # float_ops | requested bytes | peak bytes | "
+      "residual bytes | output bytes | total execution time | accelerator "
+      "execution time | cpu execution time\n_TFProfRoot (--/451 params, --/0 "
+      "flops, --/0B, --/0B, --/0B, --/2.56KB, --/13us, --/0us, --/13us)\n  DW "
+      "(3x3x3x6, 162/162 params, 0/0 flops, 0B/0B, 0B/0B, 0B/0B, "
+      "1.28KB/1.28KB, 2us/2us, 0us/0us, 2us/2us)\n  DW2 (2x2x6x12, 288/288 "
+      "params, 0/0 flops, 0B/0B, 0B/0B, 0B/0B, 1.28KB/1.28KB, 11us/11us, "
+      "0us/0us, 11us/11us)\n  ScalarW (1, 1/1 params, 0/0 flops, 0B/0B, 0B/0B, "
+      "0B/0B, 0B/0B, 0us/0us, 0us/0us, 0us/0us)\n",
       dump_str);
 }
 
 TEST_F(TFProfShowTest, DumpAcceleratorAndCPUMicros) {
   string dump_file = io::JoinPath(testing::TmpDir(), "dump");
-  Options opts(
-      5, 0, 0, 0, 0, 0, -1, "cpu_micros", {".*"},  // accout_type_regexes
-      {".*"}, {""}, {".*"}, {""}, false, {"accelerator_micros", "cpu_micros"},
-      "file", {{"outfile", dump_file}});
+  Options opts(5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, "cpu_micros",
+               {".*"},  // accout_type_regexes
+               {".*"}, {""}, {".*"}, {""}, false,
+               {"accelerator_micros", "cpu_micros"}, "file",
+               {{"outfile", dump_file}});
   tf_stats_->ShowGraphNode("scope", opts);
 
   string dump_str;
   TF_CHECK_OK(ReadFileToString(Env::Default(), dump_file, &dump_str));
   EXPECT_EQ(
       "node name | accelerator execution time | cpu execution "
-      "time\n_TFProfRoot (--/0us, --/97us)\n  conv2d (0us/0us, 0us/76us)\n    "
-      "conv2d/convolution (0us/0us, 60us/60us)\n      conv2d/convolution/Shape "
-      "(0us/0us, 0us/0us)\n      conv2d/convolution/dilation_rate (0us/0us, "
-      "0us/0us)\n    conv2d/BiasAdd (0us/0us, 12us/12us)\n    conv2d/bias "
-      "(0us/0us, 1us/2us)\n      conv2d/bias/read (0us/0us, 1us/1us)\n      "
-      "conv2d/bias/Assign (0us/0us, 0us/0us)\n      conv2d/bias/Initializer "
-      "(0us/0us, 0us/0us)\n        conv2d/bias/Initializer/Const (0us/0us, "
-      "0us/0us)\n    conv2d/kernel (0us/0us, 1us/2us)\n      "
-      "conv2d/kernel/read (0us/0us, 1us/1us)\n      conv2d/kernel/Assign "
-      "(0us/0us, 0us/0us)\n      conv2d/kernel/Initializer (0us/0us, "
-      "0us/0us)\n        conv2d/kernel/Initializer/random_uniform (0us/0us, "
-      "0us/0us)\n  conv2d_2 (0us/0us, 0us/15us)\n    conv2d_2/convolution "
-      "(0us/0us, 13us/13us)\n      conv2d_2/convolution/Shape (0us/0us, "
-      "0us/0us)\n      conv2d_2/convolution/dilation_rate (0us/0us, 0us/0us)\n "
-      "   conv2d_2/BiasAdd (0us/0us, 2us/2us)\n  conv2d_1 (0us/0us, 0us/5us)\n "
-      "   conv2d_1/kernel (0us/0us, 2us/3us)\n      conv2d_1/kernel/read "
-      "(0us/0us, 1us/1us)\n      conv2d_1/kernel/Assign (0us/0us, 0us/0us)\n   "
-      "   conv2d_1/kernel/Initializer (0us/0us, 0us/0us)\n        "
-      "conv2d_1/kernel/Initializer/random_uniform (0us/0us, 0us/0us)\n    "
-      "conv2d_1/bias (0us/0us, 1us/2us)\n      conv2d_1/bias/read (0us/0us, "
-      "1us/1us)\n      conv2d_1/bias/Assign (0us/0us, 0us/0us)\n      "
-      "conv2d_1/bias/Initializer (0us/0us, 0us/0us)\n        "
-      "conv2d_1/bias/Initializer/Const (0us/0us, 0us/0us)\n  zeros (0us/0us, "
-      "1us/1us)\n  init (0us/0us, 0us/0us)\n  save (0us/0us, 0us/0us)\n    "
-      "save/Assign (0us/0us, 0us/0us)\n    save/Assign_1 (0us/0us, 0us/0us)\n  "
-      "  save/Assign_2 (0us/0us, 0us/0us)\n    save/Assign_3 (0us/0us, "
-      "0us/0us)\n    save/Const (0us/0us, 0us/0us)\n    save/RestoreV2 "
-      "(0us/0us, 0us/0us)\n      save/RestoreV2/shape_and_slices (0us/0us, "
-      "0us/0us)\n      save/RestoreV2/tensor_names (0us/0us, 0us/0us)\n    "
-      "save/RestoreV2_1 (0us/0us, 0us/0us)\n      "
-      "save/RestoreV2_1/shape_and_slices (0us/0us, 0us/0us)\n      "
-      "save/RestoreV2_1/tensor_names (0us/0us, 0us/0us)\n    save/RestoreV2_2 "
-      "(0us/0us, 0us/0us)\n      save/RestoreV2_2/shape_and_slices (0us/0us, "
-      "0us/0us)\n      save/RestoreV2_2/tensor_names (0us/0us, 0us/0us)\n    "
-      "save/RestoreV2_3 (0us/0us, 0us/0us)\n      "
-      "save/RestoreV2_3/shape_and_slices (0us/0us, 0us/0us)\n      "
-      "save/RestoreV2_3/tensor_names (0us/0us, 0us/0us)\n    save/SaveV2 "
-      "(0us/0us, 0us/0us)\n      save/SaveV2/shape_and_slices (0us/0us, "
-      "0us/0us)\n      save/SaveV2/tensor_names (0us/0us, 0us/0us)\n    "
-      "save/control_dependency (0us/0us, 0us/0us)\n    save/restore_all "
-      "(0us/0us, 0us/0us)\n",
+      "time\n_TFProfRoot (--/404us, --/4.50ms)\n  Conv2D (226us/226us, "
+      "4.07ms/4.07ms)\n  Conv2D_1 (178us/178us, 419us/419us)\n  DW2 (0us/0us, "
+      "11us/11us)\n    DW2/Assign (0us/0us, 0us/0us)\n    DW2/Initializer "
+      "(0us/0us, 0us/0us)\n      DW2/Initializer/random_normal (0us/0us, "
+      "0us/0us)\n        DW2/Initializer/random_normal/RandomStandardNormal "
+      "(0us/0us, 0us/0us)\n        DW2/Initializer/random_normal/mean "
+      "(0us/0us, 0us/0us)\n        DW2/Initializer/random_normal/mul (0us/0us, "
+      "0us/0us)\n        DW2/Initializer/random_normal/shape (0us/0us, "
+      "0us/0us)\n        DW2/Initializer/random_normal/stddev (0us/0us, "
+      "0us/0us)\n    DW2/read (0us/0us, 0us/0us)\n  DW (0us/0us, 2us/2us)\n    "
+      "DW/Assign (0us/0us, 0us/0us)\n    DW/Initializer (0us/0us, 0us/0us)\n   "
+      "   DW/Initializer/random_normal (0us/0us, 0us/0us)\n        "
+      "DW/Initializer/random_normal/RandomStandardNormal (0us/0us, 0us/0us)\n  "
+      "      DW/Initializer/random_normal/mean (0us/0us, 0us/0us)\n        "
+      "DW/Initializer/random_normal/mul (0us/0us, 0us/0us)\n        "
+      "DW/Initializer/random_normal/shape (0us/0us, 0us/0us)\n        "
+      "DW/Initializer/random_normal/stddev (0us/0us, 0us/0us)\n    DW/read "
+      "(0us/0us, 0us/0us)\n  zeros (0us/0us, 2us/2us)\n  ScalarW (0us/0us, "
+      "0us/0us)\n    ScalarW/Assign (0us/0us, 0us/0us)\n    "
+      "ScalarW/Initializer (0us/0us, 0us/0us)\n      "
+      "ScalarW/Initializer/random_normal (0us/0us, 0us/0us)\n        "
+      "ScalarW/Initializer/random_normal/RandomStandardNormal (0us/0us, "
+      "0us/0us)\n        ScalarW/Initializer/random_normal/mean (0us/0us, "
+      "0us/0us)\n        ScalarW/Initializer/random_normal/mul (0us/0us, "
+      "0us/0us)\n        ScalarW/Initializer/random_normal/shape (0us/0us, "
+      "0us/0us)\n        ScalarW/Initializer/random_normal/stddev (0us/0us, "
+      "0us/0us)\n    ScalarW/read (0us/0us, 0us/0us)\n  init (0us/0us, "
+      "0us/0us)\n",
       dump_str);
 }
 
 TEST_F(TFProfShowTest, DumpOpMode) {
   string dump_file = io::JoinPath(testing::TmpDir(), "dump");
   Options opts(
-      5, 0, 0, 0, 0, 4, -1, "params", {".*"},  // accout_type_regexes
+      5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, "params",
+      {".*"},  // accout_type_regexes
       {".*"}, {""}, {".*"}, {""}, false,
       {"params", "bytes", "micros", "float_ops", "occurrence", "input_shapes"},
       "file", {{"outfile", dump_file}});
@@ -165,17 +154,32 @@ TEST_F(TFProfShowTest, DumpOpMode) {
   string dump_str;
   TF_CHECK_OK(ReadFileToString(Env::Default(), dump_file, &dump_str));
   EXPECT_EQ(
-      "nodename|outputbytes|totalexecutiontime|acceleratorexecutiontime|"
+      "nodename|requestedbytes|totalexecutiontime|acceleratorexecutiontime|"
       "cpuexecutiontime|#parameters|#float_ops|opoccurrence(run|defined)|"
-      "inputshapes\nVariableV21.48KB(100.00%,17.10%),5us(100.00%,5.15%),0us(0."
-      "00%,0.00%),5us(100.00%,5.15%),370params(100.00%,100.00%),0float_ops(100."
-      "00%,0.00%),4|4\n\ninput_type:\t(run*4|defined*4)\texec_time:"
-      "5us\n\nAssign0B(0.00%,0.00%),0us(94.85%,0.00%),0us(0.00%,0.00%),0us(94."
-      "85%,0.00%),0params(0.00%,0.00%),0float_ops(100.00%,0.00%),0|8\n\ninput_"
-      "type:0:unknown,\t1:unknown\t(run*0|defined*8)\texec_time:0us\n\nConst1."
-      "54KB(58.87%,17.74%),1us(80.41%,1.03%),0us(0.00%,0.00%),1us(80.41%,1.03%)"
-      ",0params(0.00%,0.00%),0float_ops(98.49%,0.00%),1|24\n\ninput_type:\t("
-      "run*1|defined*24)\texec_time:1us\n\n",
+      "inputshapes\nVariableV20B(0.00%,0.00%),13us(100.00%,0.27%),0us(100.00%,"
+      "0.00%),13us(100.00%,0.29%),451params(100.00%,100.00%),0float_ops(100.00%"
+      ",0.00%),2|3\n\ninput_type:\t(run*2|defined*3)\texec_time:13us\n\nAdd0B("
+      "0.00%,0.00%),0us(99.73%,0.00%),0us(100.00%,0.00%),0us(99.71%,0.00%),"
+      "0params(0.00%,0.00%),0float_ops(100.00%,0.00%),0|3\n\ninput_type:0:1,"
+      "\t1:1\t(run*0|defined*1)\texec_time:0us\ninput_type:0:2x2x6x12,\t1:1\t("
+      "run*0|defined*1)\texec_time:0us\ninput_type:0:3x3x3x6,\t1:1\t(run*0|"
+      "defined*1)\texec_time:0us\n\nAssign0B(0.00%,0.00%),0us(99.73%,0.00%),"
+      "0us(100.00%,0.00%),0us(99.71%,0.00%),0params(0.00%,0.00%),0float_ops("
+      "100.00%,0.00%),0|3\n\ninput_type:0:1,\t1:1\t(run*0|defined*1)\texec_"
+      "time:0us\ninput_type:0:2x2x6x12,\t1:2x2x6x12\t(run*0|defined*1)\texec_"
+      "time:0us\ninput_type:0:3x3x3x6,\t1:3x3x3x6\t(run*0|defined*1)\texec_"
+      "time:0us\n\nConst0B(0.00%,0.00%),2us(99.73%,0.04%),0us(100.00%,0.00%),"
+      "2us(99.71%,0.04%),0params(0.00%,0.00%),0float_ops(100.00%,0.00%),1|"
+      "10\n\ninput_type:\t(run*1|defined*10)\texec_time:2us\n\nConv2D14.59KB("
+      "100.00%,100.00%),4.89ms(99.69%,99.69%),404us(100.00%,100.00%),4.49ms(99."
+      "67%,99.67%),0params(0.00%,0.00%),10.44kfloat_ops(100.00%,100.00%),2|"
+      "2\n\ninput_type:0:2x3x3x6,\t1:2x2x6x12\t(run*1|defined*1)\texec_time:"
+      "597us\ninput_type:0:2x6x6x3,\t1:3x3x3x6\t(run*1|defined*1)\texec_time:4."
+      "29ms\n\nIdentity0B(0.00%,0.00%),0us(0.00%,0.00%),0us(0.00%,0.00%),0us(0."
+      "00%,0.00%),0params(0.00%,0.00%),0float_ops(0.00%,0.00%),0|3\n\ninput_"
+      "type:0:1\t(run*0|defined*1)\texec_time:0us\ninput_type:0:2x2x6x12\t(run*"
+      "0|defined*1)\texec_time:0us\ninput_type:0:3x3x3x6\t(run*0|defined*1)"
+      "\texec_time:0us\n\n",
       StringReplace(dump_str, " ", ""));
 }
 }  // namespace tfprof
diff --git a/tensorflow/core/profiler/internal/tfprof_stats_test.cc b/tensorflow/core/profiler/internal/tfprof_stats_test.cc
index 8744f5be28..e67c158521 100644
--- a/tensorflow/core/profiler/internal/tfprof_stats_test.cc
+++ b/tensorflow/core/profiler/internal/tfprof_stats_test.cc
@@ -23,12 +23,12 @@ limitations under the License.
 #include "tensorflow/core/platform/env.h"
 #include "tensorflow/core/platform/protobuf.h"
 #include "tensorflow/core/platform/test.h"
-#include "tensorflow/core/protobuf/config.pb.h"
 #include "tensorflow/core/profiler/internal/tfprof_constants.h"
 #include "tensorflow/core/profiler/internal/tfprof_options.h"
 #include "tensorflow/core/profiler/internal/tfprof_utils.h"
 #include "tensorflow/core/profiler/tfprof_log.pb.h"
 #include "tensorflow/core/profiler/tfprof_output.pb.h"
+#include "tensorflow/core/protobuf/config.pb.h"
 
 namespace tensorflow {
 namespace tfprof {
@@ -73,7 +73,7 @@ class TFProfStatsTest : public ::testing::Test {
 };
 
 TEST_F(TFProfStatsTest, CustomOpType) {
-  Options opts(3, 0, 0, 0, 0, 0, -1, "name",
+  Options opts(3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, "name",
                {kTrainableVarType},  // accout_type_regexes
                {".*"}, {""}, {".*"}, {""}, false,
                {"params", "bytes", "micros", "float_ops"}, "", {});
@@ -81,62 +81,27 @@ TEST_F(TFProfStatsTest, CustomOpType) {
 
   GraphNodeProto expected;
   CHECK(protobuf::TextFormat::ParseFromString(
-      "name: \"_TFProfRoot\"\nexec_micros: 0\nrequested_bytes: "
-      "0\ntotal_exec_micros: 5\ntotal_requested_bytes: 1480\ntotal_parameters: "
-      "370\nchildren {\n  name: \"conv2d\"\n  exec_micros: 0\n  "
-      "requested_bytes: 0\n  total_exec_micros: 2\n  total_requested_bytes: "
-      "560\n  total_parameters: 140\n  children {\n    name: \"conv2d/bias\"\n "
-      "   exec_micros: 1\n    requested_bytes: 20\n    parameters: 5\n    "
-      "total_exec_micros: 1\n    total_requested_bytes: 20\n    "
-      "total_parameters: 5\n    devices: "
-      "\"/job:localhost/replica:0/task:0/cpu:0\"\n    float_ops: 0\n    "
-      "total_float_ops: 0\n    accelerator_exec_micros: 0\n    "
-      "cpu_exec_micros: 1\n    total_accelerator_exec_micros: 0\n    "
-      "total_cpu_exec_micros: 1\n    run_count: 1\n    total_run_count: 1\n    "
-      "total_definition_count: 1\n  }\n  children {\n    name: "
-      "\"conv2d/kernel\"\n    exec_micros: 1\n    requested_bytes: 540\n    "
-      "parameters: 135\n    total_exec_micros: 1\n    total_requested_bytes: "
-      "540\n    total_parameters: 135\n    devices: "
-      "\"/job:localhost/replica:0/task:0/cpu:0\"\n    float_ops: 0\n    "
-      "total_float_ops: 0\n    accelerator_exec_micros: 0\n    "
-      "cpu_exec_micros: 1\n    total_accelerator_exec_micros: 0\n    "
-      "total_cpu_exec_micros: 1\n    run_count: 1\n    total_run_count: 1\n    "
-      "total_definition_count: 1\n  }\n  float_ops: 0\n  total_float_ops: 0\n  "
-      "accelerator_exec_micros: 0\n  cpu_exec_micros: 0\n  "
-      "total_accelerator_exec_micros: 0\n  total_cpu_exec_micros: 2\n  "
-      "run_count: 0\n  total_run_count: 2\n  total_definition_count: "
-      "3\n}\nchildren {\n  name: \"conv2d_1\"\n  exec_micros: 0\n  "
-      "requested_bytes: 0\n  total_exec_micros: 3\n  total_requested_bytes: "
-      "920\n  total_parameters: 230\n  children {\n    name: "
-      "\"conv2d_1/bias\"\n    exec_micros: 1\n    requested_bytes: 20\n    "
-      "parameters: 5\n    total_exec_micros: 1\n    total_requested_bytes: "
-      "20\n    total_parameters: 5\n    devices: "
-      "\"/job:localhost/replica:0/task:0/cpu:0\"\n    float_ops: 0\n    "
-      "total_float_ops: 0\n    accelerator_exec_micros: 0\n    "
-      "cpu_exec_micros: 1\n    total_accelerator_exec_micros: 0\n    "
-      "total_cpu_exec_micros: 1\n    run_count: 1\n    total_run_count: 1\n    "
-      "total_definition_count: 1\n  }\n  children {\n    name: "
-      "\"conv2d_1/kernel\"\n    exec_micros: 2\n    requested_bytes: 900\n    "
-      "parameters: 225\n    total_exec_micros: 2\n    total_requested_bytes: "
-      "900\n    total_parameters: 225\n    devices: "
-      "\"/job:localhost/replica:0/task:0/cpu:0\"\n    float_ops: 0\n    "
-      "total_float_ops: 0\n    accelerator_exec_micros: 0\n    "
-      "cpu_exec_micros: 2\n    total_accelerator_exec_micros: 0\n    "
-      "total_cpu_exec_micros: 2\n    run_count: 1\n    total_run_count: 1\n    "
-      "total_definition_count: 1\n  }\n  float_ops: 0\n  total_float_ops: 0\n  "
-      "accelerator_exec_micros: 0\n  cpu_exec_micros: 0\n  "
-      "total_accelerator_exec_micros: 0\n  total_cpu_exec_micros: 3\n  "
-      "run_count: 0\n  total_run_count: 2\n  total_definition_count: "
-      "3\n}\nfloat_ops: 0\ntotal_float_ops: 0\naccelerator_exec_micros: "
-      "0\ncpu_exec_micros: 0\ntotal_accelerator_exec_micros: "
-      "0\ntotal_cpu_exec_micros: 5\nrun_count: 0\ntotal_run_count: "
-      "4\ntotal_definition_count: 6\n",
+      "name: \"_TFProfRoot\"\ntotal_exec_micros: 13\ntotal_parameters: "
+      "451\nchildren {\n  name: \"DW\"\n  exec_micros: 2\n  parameters: 162\n  "
+      "total_exec_micros: 2\n  total_parameters: 162\n  devices: "
+      "\"/job:localhost/replica:0/task:0/gpu:0\"\n  cpu_exec_micros: 2\n  "
+      "total_cpu_exec_micros: 2\n  run_count: 1\n  total_run_count: 1\n  "
+      "total_definition_count: 1\n  output_bytes: 1280\n  total_output_bytes: "
+      "1280\n}\nchildren {\n  name: \"DW2\"\n  exec_micros: 11\n  parameters: "
+      "288\n  total_exec_micros: 11\n  total_parameters: 288\n  devices: "
+      "\"/job:localhost/replica:0/task:0/gpu:0\"\n  cpu_exec_micros: 11\n  "
+      "total_cpu_exec_micros: 11\n  run_count: 1\n  total_run_count: 1\n  "
+      "total_definition_count: 1\n  output_bytes: 1280\n  total_output_bytes: "
+      "1280\n}\nchildren {\n  name: \"ScalarW\"\n  parameters: 1\n  "
+      "total_parameters: 1\n  total_definition_count: "
+      "1\n}\ntotal_cpu_exec_micros: 13\ntotal_run_count: "
+      "2\ntotal_definition_count: 3\ntotal_output_bytes: 2560\n",
       &expected));
   EXPECT_EQ(expected.DebugString(), root.DebugString());
 }
 
 TEST_F(TFProfStatsTest, CheckPointOpType) {
-  Options opts(3, 0, 0, 0, 0, 0, -1, "name",
+  Options opts(3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, "name",
                {kCkptVarType},  // accout_type_regexes
                {".*"}, {""}, {".*"}, {""}, false,
                {"params", "bytes", "micros", "float_ops"}, "", {});
@@ -144,169 +109,235 @@ TEST_F(TFProfStatsTest, CheckPointOpType) {
 
   GraphNodeProto expected;
   CHECK(protobuf::TextFormat::ParseFromString(
-      "name: \"_TFProfRoot\"\nexec_micros: 0\nrequested_bytes: "
-      "0\ntotal_exec_micros: 5\ntotal_requested_bytes: 1480\ntotal_parameters: "
-      "370\nchildren {\n  name: \"conv2d\"\n  exec_micros: 0\n  "
-      "requested_bytes: 0\n  total_exec_micros: 2\n  total_requested_bytes: "
-      "560\n  total_parameters: 140\n  children {\n    name: \"conv2d/bias\"\n "
-      "   exec_micros: 1\n    requested_bytes: 20\n    parameters: 5\n    "
-      "total_exec_micros: 1\n    total_requested_bytes: 20\n    "
-      "total_parameters: 5\n    devices: "
-      "\"/job:localhost/replica:0/task:0/cpu:0\"\n    float_ops: 0\n    "
-      "total_float_ops: 0\n    accelerator_exec_micros: 0\n    "
-      "cpu_exec_micros: 1\n    total_accelerator_exec_micros: 0\n    "
-      "total_cpu_exec_micros: 1\n    run_count: 1\n    total_run_count: 1\n    "
-      "total_definition_count: 1\n  }\n  children {\n    name: "
-      "\"conv2d/kernel\"\n    exec_micros: 1\n    requested_bytes: 540\n    "
-      "parameters: 135\n    total_exec_micros: 1\n    total_requested_bytes: "
-      "540\n    total_parameters: 135\n    devices: "
-      "\"/job:localhost/replica:0/task:0/cpu:0\"\n    float_ops: 0\n    "
-      "total_float_ops: 0\n    accelerator_exec_micros: 0\n    "
-      "cpu_exec_micros: 1\n    total_accelerator_exec_micros: 0\n    "
-      "total_cpu_exec_micros: 1\n    run_count: 1\n    total_run_count: 1\n    "
-      "total_definition_count: 1\n  }\n  float_ops: 0\n  total_float_ops: 0\n  "
-      "accelerator_exec_micros: 0\n  cpu_exec_micros: 0\n  "
-      "total_accelerator_exec_micros: 0\n  total_cpu_exec_micros: 2\n  "
-      "run_count: 0\n  total_run_count: 2\n  total_definition_count: "
-      "3\n}\nchildren {\n  name: \"conv2d_1\"\n  exec_micros: 0\n  "
-      "requested_bytes: 0\n  total_exec_micros: 3\n  total_requested_bytes: "
-      "920\n  total_parameters: 230\n  children {\n    name: "
-      "\"conv2d_1/bias\"\n    exec_micros: 1\n    requested_bytes: 20\n    "
-      "parameters: 5\n    total_exec_micros: 1\n    total_requested_bytes: "
-      "20\n    total_parameters: 5\n    devices: "
-      "\"/job:localhost/replica:0/task:0/cpu:0\"\n    float_ops: 0\n    "
-      "total_float_ops: 0\n    accelerator_exec_micros: 0\n    "
-      "cpu_exec_micros: 1\n    total_accelerator_exec_micros: 0\n    "
-      "total_cpu_exec_micros: 1\n    run_count: 1\n    total_run_count: 1\n    "
-      "total_definition_count: 1\n  }\n  children {\n    name: "
-      "\"conv2d_1/kernel\"\n    exec_micros: 2\n    requested_bytes: 900\n    "
-      "parameters: 225\n    total_exec_micros: 2\n    total_requested_bytes: "
-      "900\n    total_parameters: 225\n    devices: "
-      "\"/job:localhost/replica:0/task:0/cpu:0\"\n    float_ops: 0\n    "
-      "total_float_ops: 0\n    accelerator_exec_micros: 0\n    "
-      "cpu_exec_micros: 2\n    total_accelerator_exec_micros: 0\n    "
-      "total_cpu_exec_micros: 2\n    run_count: 1\n    total_run_count: 1\n    "
-      "total_definition_count: 1\n  }\n  float_ops: 0\n  total_float_ops: 0\n  "
-      "accelerator_exec_micros: 0\n  cpu_exec_micros: 0\n  "
-      "total_accelerator_exec_micros: 0\n  total_cpu_exec_micros: 3\n  "
-      "run_count: 0\n  total_run_count: 2\n  total_definition_count: "
-      "3\n}\nfloat_ops: 0\ntotal_float_ops: 0\naccelerator_exec_micros: "
-      "0\ncpu_exec_micros: 0\ntotal_accelerator_exec_micros: "
-      "0\ntotal_cpu_exec_micros: 5\nrun_count: 0\ntotal_run_count: "
-      "4\ntotal_definition_count: 6\n",
+      "name: \"_TFProfRoot\"\ntotal_exec_micros: 13\ntotal_parameters: "
+      "451\nchildren {\n  name: \"DW\"\n  exec_micros: 2\n  parameters: 162\n  "
+      "total_exec_micros: 2\n  total_parameters: 162\n  devices: "
+      "\"/job:localhost/replica:0/task:0/gpu:0\"\n  cpu_exec_micros: 2\n  "
+      "total_cpu_exec_micros: 2\n  run_count: 1\n  total_run_count: 1\n  "
+      "total_definition_count: 1\n  output_bytes: 1280\n  total_output_bytes: "
+      "1280\n}\nchildren {\n  name: \"DW2\"\n  exec_micros: 11\n  parameters: "
+      "288\n  total_exec_micros: 11\n  total_parameters: 288\n  devices: "
+      "\"/job:localhost/replica:0/task:0/gpu:0\"\n  cpu_exec_micros: 11\n  "
+      "total_cpu_exec_micros: 11\n  run_count: 1\n  total_run_count: 1\n  "
+      "total_definition_count: 1\n  output_bytes: 1280\n  total_output_bytes: "
+      "1280\n}\nchildren {\n  name: \"ScalarW\"\n  parameters: 1\n  "
+      "total_parameters: 1\n  total_definition_count: "
+      "1\n}\ntotal_cpu_exec_micros: 13\ntotal_run_count: "
+      "2\ntotal_definition_count: 3\ntotal_output_bytes: 2560\n",
       &expected));
   EXPECT_EQ(expected.DebugString(), root.DebugString());
 }
 
 TEST_F(TFProfStatsTest, TestGraph) {
-  Options opts(100, 0, 10000, 0, 0, 0, -1, "name", {".*"},
-               {"cost.*"},  // start_name_regexes
+  Options opts(100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, "name", {".*"},
+               {"DW/Initializer/random_normal/mul"},  // start_name_regexes
                {""}, {".*"}, {""}, false,
                {"params", "bytes", "micros", "float_ops"}, "", {});
   const GraphNodeProto& root = tf_stats_->ShowGraphNode("graph", opts);
 
   GraphNodeProto expected;
   CHECK(protobuf::TextFormat::ParseFromString(
-      "name: \"_TFProfRoot\"\nexec_micros: 0\nrequested_bytes: "
-      "0\ntotal_exec_micros: 97\ntotal_requested_bytes: "
-      "8656\ntotal_parameters: 370\nfloat_ops: 0\ntotal_float_ops: "
-      "34360\naccelerator_exec_micros: 0\ncpu_exec_micros: "
-      "0\ntotal_accelerator_exec_micros: 0\ntotal_cpu_exec_micros: "
-      "97\nrun_count: 0\ntotal_run_count: 13\ntotal_definition_count: 60\n",
+      "name: \"_TFProfRoot\"\ntotal_exec_micros: 4904\ntotal_requested_bytes: "
+      "14592\ntotal_parameters: 451\nchildren {\n  name: "
+      "\"DW/Initializer/random_normal/mul\"\n  children {\n    name: "
+      "\"DW/Initializer/random_normal/RandomStandardNormal\"\n    children {\n "
+      "     name: \"DW/Initializer/random_normal/shape\"\n      "
+      "total_definition_count: 1\n    }\n    input_shapes {\n      key: 0\n    "
+      "  value {\n        dim {\n          size: 4\n        }\n      }\n    "
+      "}\n    total_definition_count: 2\n  }\n  children {\n    name: "
+      "\"DW/Initializer/random_normal/stddev\"\n    total_definition_count: "
+      "1\n  }\n  input_shapes {\n    key: 0\n    value {\n      dim {\n        "
+      "size: 3\n      }\n      dim {\n        size: 3\n      }\n      dim {\n  "
+      "      size: 3\n      }\n      dim {\n        size: 6\n      }\n    }\n  "
+      "}\n  input_shapes {\n    key: 1\n    value {\n      dim {\n        "
+      "size: 1\n      }\n    }\n  }\n  total_definition_count: "
+      "4\n}\ntotal_float_ops: 10440\ntotal_accelerator_exec_micros: "
+      "404\ntotal_cpu_exec_micros: 4500\ntotal_run_count: "
+      "5\ntotal_definition_count: 31\ntotal_peak_bytes: "
+      "9984\ntotal_residual_bytes: 1280\ntotal_output_bytes: 4864\n",
       &expected));
   EXPECT_EQ(expected.DebugString(), root.DebugString());
 }
 
 TEST_F(TFProfStatsTest, TestFloatOps) {
-  Options opts(10, 0, 0, 0, 1, 0, -1, "name", {".*"}, {".*"}, {""}, {".*"},
-               {""}, false, {"float_ops"}, "", {});
+  Options opts(10, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, -1, "name", {".*"}, {".*"},
+               {""}, {".*"}, {""}, false, {"float_ops"}, "", {});
   const GraphNodeProto& root = tf_stats_->ShowGraphNode("scope", opts);
 
   GraphNodeProto expected;
   CHECK(protobuf::TextFormat::ParseFromString(
-      "name: \"_TFProfRoot\"\nexec_micros: 0\nrequested_bytes: "
-      "0\ntotal_exec_micros: 97\ntotal_requested_bytes: "
-      "8656\ntotal_parameters: 370\nchildren {\n  name: \"conv2d/BiasAdd\"\n  "
-      "exec_micros: 12\n  requested_bytes: 1440\n  total_exec_micros: 12\n  "
-      "total_requested_bytes: 1440\n  total_parameters: 0\n  devices: "
-      "\"/job:localhost/replica:0/task:0/cpu:0\"\n  float_ops: 360\n  "
-      "total_float_ops: 360\n  input_shapes {\n    key: 0\n    value {\n      "
-      "unknown_rank: true\n    }\n  }\n  input_shapes {\n    key: 1\n    value "
-      "{\n      unknown_rank: true\n    }\n  }\n  accelerator_exec_micros: 0\n "
-      " cpu_exec_micros: 12\n  total_accelerator_exec_micros: 0\n  "
-      "total_cpu_exec_micros: 12\n  run_count: 1\n  total_run_count: 1\n  "
-      "total_definition_count: 1\n}\nchildren {\n  name: "
-      "\"conv2d/convolution\"\n  exec_micros: 60\n  requested_bytes: 1440\n  "
-      "total_exec_micros: 60\n  total_requested_bytes: 1440\n  "
-      "total_parameters: 0\n  devices: "
-      "\"/job:localhost/replica:0/task:0/cpu:0\"\n  float_ops: 19440\n  "
-      "total_float_ops: 19440\n  input_shapes {\n    key: 0\n    value {\n     "
-      " unknown_rank: true\n    }\n  }\n  input_shapes {\n    key: 1\n    "
-      "value {\n      unknown_rank: true\n    }\n  }\n  "
-      "accelerator_exec_micros: 0\n  cpu_exec_micros: 60\n  "
-      "total_accelerator_exec_micros: 0\n  total_cpu_exec_micros: 60\n  "
-      "run_count: 1\n  total_run_count: 1\n  total_definition_count: "
-      "3\n}\nchildren {\n  name: \"conv2d_2/BiasAdd\"\n  exec_micros: 2\n  "
-      "requested_bytes: 640\n  total_exec_micros: 2\n  total_requested_bytes: "
-      "640\n  total_parameters: 0\n  devices: "
-      "\"/job:localhost/replica:0/task:0/cpu:0\"\n  float_ops: 160\n  "
-      "total_float_ops: 160\n  input_shapes {\n    key: 0\n    value {\n      "
-      "unknown_rank: true\n    }\n  }\n  input_shapes {\n    key: 1\n    value "
-      "{\n      unknown_rank: true\n    }\n  }\n  accelerator_exec_micros: 0\n "
-      " cpu_exec_micros: 2\n  total_accelerator_exec_micros: 0\n  "
-      "total_cpu_exec_micros: 2\n  run_count: 1\n  total_run_count: 1\n  "
-      "total_definition_count: 1\n}\nchildren {\n  name: "
-      "\"conv2d_2/convolution\"\n  exec_micros: 13\n  requested_bytes: 640\n  "
-      "total_exec_micros: 13\n  total_requested_bytes: 640\n  "
-      "total_parameters: 0\n  devices: "
-      "\"/job:localhost/replica:0/task:0/cpu:0\"\n  float_ops: 14400\n  "
-      "total_float_ops: 14400\n  input_shapes {\n    key: 0\n    value {\n     "
-      " unknown_rank: true\n    }\n  }\n  input_shapes {\n    key: 1\n    "
-      "value {\n      unknown_rank: true\n    }\n  }\n  "
-      "accelerator_exec_micros: 0\n  cpu_exec_micros: 13\n  "
-      "total_accelerator_exec_micros: 0\n  total_cpu_exec_micros: 13\n  "
-      "run_count: 1\n  total_run_count: 1\n  total_definition_count: "
-      "3\n}\nfloat_ops: 0\ntotal_float_ops: 34360\naccelerator_exec_micros: "
-      "0\ncpu_exec_micros: 0\ntotal_accelerator_exec_micros: "
-      "0\ntotal_cpu_exec_micros: 97\nrun_count: 0\ntotal_run_count: "
-      "13\ntotal_definition_count: 68\n",
+      "name: \"_TFProfRoot\"\ntotal_exec_micros: 4904\ntotal_requested_bytes: "
+      "14592\ntotal_parameters: 451\nchildren {\n  name: \"Conv2D\"\n  "
+      "exec_micros: 4292\n  requested_bytes: 9472\n  total_exec_micros: 4292\n "
+      " total_requested_bytes: 9472\n  devices: "
+      "\"/job:localhost/replica:0/task:0/gpu:0\"\n  float_ops: 5832\n  "
+      "total_float_ops: 5832\n  input_shapes {\n    key: 0\n    value {\n      "
+      "dim {\n        size: 2\n      }\n      dim {\n        size: 6\n      "
+      "}\n      dim {\n        size: 6\n      }\n      dim {\n        size: "
+      "3\n      }\n    }\n  }\n  input_shapes {\n    key: 1\n    value {\n     "
+      " dim {\n        size: 3\n      }\n      dim {\n        size: 3\n      "
+      "}\n      dim {\n        size: 3\n      }\n      dim {\n        size: "
+      "6\n      }\n    }\n  }\n  accelerator_exec_micros: 226\n  "
+      "cpu_exec_micros: 4066\n  total_accelerator_exec_micros: 226\n  "
+      "total_cpu_exec_micros: 4066\n  run_count: 1\n  total_run_count: 1\n  "
+      "total_definition_count: 1\n  peak_bytes: 5888\n  residual_bytes: 768\n  "
+      "output_bytes: 768\n  total_peak_bytes: 5888\n  total_residual_bytes: "
+      "768\n  total_output_bytes: 768\n}\nchildren {\n  name: \"Conv2D_1\"\n  "
+      "exec_micros: 597\n  requested_bytes: 5120\n  total_exec_micros: 597\n  "
+      "total_requested_bytes: 5120\n  devices: "
+      "\"/job:localhost/replica:0/task:0/gpu:0\"\n  float_ops: 4608\n  "
+      "total_float_ops: 4608\n  input_shapes {\n    key: 0\n    value {\n      "
+      "dim {\n        size: 2\n      }\n      dim {\n        size: 3\n      "
+      "}\n      dim {\n        size: 3\n      }\n      dim {\n        size: "
+      "6\n      }\n    }\n  }\n  input_shapes {\n    key: 1\n    value {\n     "
+      " dim {\n        size: 2\n      }\n      dim {\n        size: 2\n      "
+      "}\n      dim {\n        size: 6\n      }\n      dim {\n        size: "
+      "12\n      }\n    }\n  }\n  accelerator_exec_micros: 178\n  "
+      "cpu_exec_micros: 419\n  total_accelerator_exec_micros: 178\n  "
+      "total_cpu_exec_micros: 419\n  run_count: 1\n  total_run_count: 1\n  "
+      "total_definition_count: 1\n  peak_bytes: 4096\n  residual_bytes: 512\n  "
+      "output_bytes: 512\n  total_peak_bytes: 4096\n  total_residual_bytes: "
+      "512\n  total_output_bytes: 512\n}\ntotal_float_ops: "
+      "10440\ntotal_accelerator_exec_micros: 404\ntotal_cpu_exec_micros: "
+      "4500\ntotal_run_count: 5\ntotal_definition_count: 34\ntotal_peak_bytes: "
+      "9984\ntotal_residual_bytes: 1280\ntotal_output_bytes: 4864\n",
       &expected));
   EXPECT_EQ(expected.DebugString(), root.DebugString());
 }
 
 TEST_F(TFProfStatsTest, TestAccountShownNameOnly) {
-  Options opts(100, 0, 0, 0, 0, 0, -1, "name", {".*"}, {".*"}, {""},
-               {"unit_2_1.*DW"},  // show_name_regexes.
-               {""}, true,        // account_displayed_op_only.
+  Options opts(100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, "name", {".*"}, {".*"},
+               {""}, {"Conv2D_1"},  // show_name_regexes.
+               {""}, true,          // account_displayed_op_only.
                {"params"}, "", {});
   const GraphNodeProto& root = tf_stats_->ShowGraphNode("scope", opts);
 
   GraphNodeProto expected;
   CHECK(protobuf::TextFormat::ParseFromString(
-      "name: \"_TFProfRoot\"\nexec_micros: 0\nrequested_bytes: "
-      "0\ntotal_exec_micros: 0\ntotal_requested_bytes: 0\ntotal_parameters: "
-      "0\nfloat_ops: 0\ntotal_float_ops: 0\naccelerator_exec_micros: "
-      "0\ncpu_exec_micros: 0\ntotal_accelerator_exec_micros: "
-      "0\ntotal_cpu_exec_micros: 0\nrun_count: 0\ntotal_run_count: "
-      "0\ntotal_definition_count: 1\n",
+      "name: \"_TFProfRoot\"\ntotal_exec_micros: 597\ntotal_requested_bytes: "
+      "5120\nchildren {\n  name: \"Conv2D_1\"\n  exec_micros: 597\n  "
+      "requested_bytes: 5120\n  total_exec_micros: 597\n  "
+      "total_requested_bytes: 5120\n  devices: "
+      "\"/job:localhost/replica:0/task:0/gpu:0\"\n  float_ops: 4608\n  "
+      "total_float_ops: 4608\n  input_shapes {\n    key: 0\n    value {\n      "
+      "dim {\n        size: 2\n      }\n      dim {\n        size: 3\n      "
+      "}\n      dim {\n        size: 3\n      }\n      dim {\n        size: "
+      "6\n      }\n    }\n  }\n  input_shapes {\n    key: 1\n    value {\n     "
+      " dim {\n        size: 2\n      }\n      dim {\n        size: 2\n      "
+      "}\n      dim {\n        size: 6\n      }\n      dim {\n        size: "
+      "12\n      }\n    }\n  }\n  accelerator_exec_micros: 178\n  "
+      "cpu_exec_micros: 419\n  total_accelerator_exec_micros: 178\n  "
+      "total_cpu_exec_micros: 419\n  run_count: 1\n  total_run_count: 1\n  "
+      "total_definition_count: 1\n  peak_bytes: 4096\n  residual_bytes: 512\n  "
+      "output_bytes: 512\n  total_peak_bytes: 4096\n  total_residual_bytes: "
+      "512\n  total_output_bytes: 512\n}\ntotal_float_ops: "
+      "4608\ntotal_accelerator_exec_micros: 178\ntotal_cpu_exec_micros: "
+      "419\ntotal_run_count: 1\ntotal_definition_count: 2\ntotal_peak_bytes: "
+      "4096\ntotal_residual_bytes: 512\ntotal_output_bytes: 512\n",
       &expected));
   EXPECT_EQ(expected.DebugString(), root.DebugString());
 }
 
 TEST_F(TFProfStatsTest, TestShowTensorValue) {
-  Options opts(10, 0, 0, 0, 0, 0, -1, "name", {".*"}, {".*"}, {""},
-               {"unit_1_0.*gamma"}, {""}, false,
+  Options opts(10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, "name", {".*"}, {".*"},
+               {""}, {"DW"}, {""}, false,
                {"tensor_value"},  // Show tensor value from checkpoint.
                "", {});
   const GraphNodeProto& root = tf_stats_->ShowGraphNode("scope", opts);
   GraphNodeProto expected;
   CHECK(protobuf::TextFormat::ParseFromString(
-      "name: \"_TFProfRoot\"\nexec_micros: 0\nrequested_bytes: "
-      "0\ntotal_exec_micros: 97\ntotal_requested_bytes: "
-      "8656\ntotal_parameters: 370\nfloat_ops: 0\ntotal_float_ops: "
-      "34360\naccelerator_exec_micros: 0\ncpu_exec_micros: "
-      "0\ntotal_accelerator_exec_micros: 0\ntotal_cpu_exec_micros: "
-      "97\nrun_count: 0\ntotal_run_count: 13\ntotal_definition_count: 68\n",
+      "name: \"_TFProfRoot\"\ntotal_exec_micros: 4904\ntotal_requested_bytes: "
+      "14592\ntotal_parameters: 451\nchildren {\n  name: \"DW\"\n  "
+      "exec_micros: 2\n  parameters: 162\n  total_exec_micros: 2\n  "
+      "total_parameters: 162\n  devices: "
+      "\"/job:localhost/replica:0/task:0/gpu:0\"\n  tensor_value {\n    dtype: "
+      "DT_FLOAT\n    value_double: -0.000534315\n    value_double: "
+      "-0.00089602\n    value_double: -0.000417239\n    value_double: "
+      "0.00041444\n    value_double: 0.000780691\n    value_double: "
+      "-0.000559057\n    value_double: -0.000234623\n    value_double: "
+      "0.00013393\n    value_double: -0.00187574\n    value_double: "
+      "0.000785666\n    value_double: 0.000673294\n    value_double: "
+      "0.000653368\n    value_double: 0.000924489\n    value_double: "
+      "-0.000318373\n    value_double: -0.000385202\n    value_double: "
+      "-7.92661e-05\n    value_double: 2.70287e-05\n    value_double: "
+      "0.00152302\n    value_double: 8.04435e-05\n    value_double: "
+      "-0.00058102\n    value_double: 0.000244291\n    value_double: "
+      "-0.000438045\n    value_double: -0.000110199\n    value_double: "
+      "0.000731663\n    value_double: -0.0012326\n    value_double: "
+      "0.00064065\n    value_double: -0.00135203\n    value_double: "
+      "-6.42784e-05\n    value_double: -0.0011857\n    value_double: "
+      "-0.000487383\n    value_double: 3.41493e-05\n    value_double: "
+      "-0.00158447\n    value_double: 0.00168448\n    value_double: "
+      "0.00160946\n    value_double: -0.000600483\n    value_double: "
+      "0.000650259\n    value_double: -0.00109938\n    value_double: "
+      "-0.000842166\n    value_double: -0.0022673\n    value_double: "
+      "-0.00101941\n    value_double: -0.0011169\n    value_double: "
+      "-0.0013557\n    value_double: -1.46354e-05\n    value_double: "
+      "-1.05487e-05\n    value_double: -0.00092014\n    value_double: "
+      "0.00272874\n    value_double: 5.13942e-05\n    value_double: "
+      "-0.00223472\n    value_double: -0.000250875\n    value_double: "
+      "-0.00180747\n    value_double: -0.00234714\n    value_double: "
+      "-0.00113523\n    value_double: -0.00112635\n    value_double: "
+      "-0.000843118\n    value_double: -6.84256e-05\n    value_double: "
+      "0.000243336\n    value_double: 0.00119151\n    value_double: "
+      "0.00131022\n    value_double: 0.000768038\n    value_double: "
+      "-8.90095e-05\n    value_double: -0.000626427\n    value_double: "
+      "-7.0617e-05\n    value_double: -0.0021988\n    value_double: "
+      "-0.00221544\n    value_double: -0.000393118\n    value_double: "
+      "0.000159464\n    value_double: -0.000874746\n    value_double: "
+      "-0.00131239\n    value_double: -0.00135747\n    value_double: "
+      "-0.00179753\n    value_double: -0.00101005\n    value_double: "
+      "-0.000107518\n    value_double: -0.000616882\n    value_double: "
+      "-0.000360923\n    value_double: -0.00026896\n    value_double: "
+      "-0.000142548\n    value_double: 0.000577227\n    value_double: "
+      "0.000536027\n    value_double: 0.00126907\n    value_double: "
+      "-0.00122712\n    value_double: -3.60499e-05\n    value_double: "
+      "0.000151026\n    value_double: 0.00107658\n    value_double: "
+      "0.00116475\n    value_double: -0.00145312\n    value_double: "
+      "0.000233326\n    value_double: -0.00020198\n    value_double: "
+      "0.00179029\n    value_double: 0.00150048\n    value_double: "
+      "-0.000884775\n    value_double: 0.000409188\n    value_double: "
+      "2.97176e-05\n    value_double: -0.000506118\n    value_double: "
+      "-2.33992e-05\n    value_double: -0.00037212\n    value_double: "
+      "0.000862773\n    value_double: 0.00174046\n    value_double: "
+      "-0.000240207\n    value_double: 0.000663976\n    value_double: "
+      "-0.00134747\n    value_double: 0.00115585\n    value_double: "
+      "0.000555869\n    value_double: 0.00176722\n    value_double: "
+      "-0.000518409\n    value_double: 0.00101051\n    value_double: "
+      "0.000129399\n    value_double: -0.000916389\n    value_double: "
+      "-0.00137693\n    value_double: -0.00152412\n    value_double: "
+      "7.32515e-05\n    value_double: -0.000190811\n    value_double: "
+      "-0.000158692\n    value_double: -5.7791e-05\n    value_double: "
+      "0.000671785\n    value_double: -0.00152924\n    value_double: "
+      "0.00117314\n    value_double: -0.000384202\n    value_double: "
+      "0.00176709\n    value_double: -0.000181703\n    value_double: "
+      "-0.000460994\n    value_double: 0.000643716\n    value_double: "
+      "4.76719e-05\n    value_double: -0.00101037\n    value_double: "
+      "0.00159621\n    value_double: 0.00186758\n    value_double: "
+      "0.00100001\n    value_double: -0.00121831\n    value_double: "
+      "0.00132231\n    value_double: 0.0013511\n    value_double: 0.00106659\n "
+      "   value_double: 0.00018091\n    value_double: 0.00155925\n    "
+      "value_double: 4.26087e-05\n    value_double: 0.000243264\n    "
+      "value_double: -0.0017202\n    value_double: -0.000218897\n    "
+      "value_double: 0.00118693\n    value_double: 0.00258909\n    "
+      "value_double: 0.000641913\n    value_double: -0.0013211\n    "
+      "value_double: -0.00171943\n    value_double: 0.00089151\n    "
+      "value_double: -0.00114969\n    value_double: -0.000196331\n    "
+      "value_double: 0.00109994\n    value_double: 0.000302616\n    "
+      "value_double: 0.000675812\n    value_double: 0.00112222\n    "
+      "value_double: 0.000516456\n    value_double: 0.00133357\n    "
+      "value_double: 0.000298491\n    value_double: 0.00145934\n    "
+      "value_double: -0.00159102\n    value_double: -0.000819061\n    "
+      "value_double: 0.000120583\n    value_double: 0.0006108\n    "
+      "value_double: 0.00124132\n    value_double: 0.000764859\n    "
+      "value_double: 0.000374641\n    value_double: -0.00149603\n    "
+      "value_double: -0.000317367\n    value_double: -0.000417829\n  }\n  "
+      "cpu_exec_micros: 2\n  total_cpu_exec_micros: 2\n  run_count: 1\n  "
+      "total_run_count: 1\n  total_definition_count: 10\n  output_bytes: "
+      "1280\n  total_output_bytes: 1280\n}\ntotal_float_ops: "
+      "10440\ntotal_accelerator_exec_micros: 404\ntotal_cpu_exec_micros: "
+      "4500\ntotal_run_count: 5\ntotal_definition_count: 34\ntotal_peak_bytes: "
+      "9984\ntotal_residual_bytes: 1280\ntotal_output_bytes: 4864\n",
       &expected));
   EXPECT_EQ(expected.DebugString(), root.DebugString());
 }
diff --git a/tensorflow/core/profiler/internal/tfprof_tensor.h b/tensorflow/core/profiler/internal/tfprof_tensor.h
index d6c4ae1311..9f72e081c9 100644
--- a/tensorflow/core/profiler/internal/tfprof_tensor.h
+++ b/tensorflow/core/profiler/internal/tfprof_tensor.h
@@ -51,6 +51,33 @@ class TFProfTensor {
 
   void Build();
 
+  template <typename T>
+  bool AddValue(const T& value, TFProfTensorProto* dim) {
+    std::ostringstream sstream;
+    sstream << value;
+    if (typeid(value) == typeid(double)) {
+      double double_val;
+      CHECK(strings::safe_strtod(sstream.str().c_str(), &double_val));
+      dim->add_value_double(double_val);
+      formatted_str_ += strings::Printf(
+          "%.2f ", dim->value_double(dim->value_double_size() - 1));
+    } else if (typeid(value) == typeid(int64)) {
+      int64 int64_val;
+      CHECK(strings::safe_strto64(sstream.str().c_str(), &int64_val));
+      dim->add_value_int64(int64_val);
+      formatted_str_ += strings::Printf(
+          "%lld ",
+          static_cast<int64>(dim->value_int64(dim->value_int64_size() - 1)));
+    } else if (typeid(value) == typeid(string)) {
+      dim->add_value_str(sstream.str());
+      formatted_str_ =
+          strings::StrCat(formatted_str_, "'",
+                          dim->value_str(dim->value_str_size() - 1) + "' ");
+    } else {
+      CHECK(false) << "Unsupported type: " << typeid(value).name();
+    }
+  }
+
   // It assumes the flatten values are stored in row-major, which is mentioned
   // indirectly at various places:
   // TODO(xpan): Further verifying it.
@@ -59,37 +86,65 @@ class TFProfTensor {
                     TFProfTensorProto* dim) {
     formatted_str_ += "[";
     int64 nstart = start;
-    for (int i = 0; i < tensor_->dim_size(depth); i++) {
-      // Last dimension, pull the values.
-      if (depth == tensor_->dims() - 1) {
-        std::ostringstream sstream;
-        sstream << values[nstart];
-
-        if (typeid(values[nstart]) == typeid(double)) {
-          double double_val;
-          CHECK(strings::safe_strtod(sstream.str().c_str(), &double_val));
-          dim->add_value_double(double_val);
-          formatted_str_ += strings::Printf(
-              "%.2f ", dim->value_double(dim->value_double_size() - 1));
-        } else if (typeid(values[nstart]) == typeid(int64)) {
-          int64 int64_val;
-          CHECK(strings::safe_strto64(sstream.str().c_str(), &int64_val));
-          dim->add_value_int64(int64_val);
-          formatted_str_ += strings::Printf(
-              "%lld ", static_cast<int64>(
-                           dim->value_int64(dim->value_int64_size() - 1)));
-        } else if (typeid(values[nstart]) == typeid(string)) {
-          dim->add_value_str(sstream.str());
-          formatted_str_ =
-              strings::StrCat(formatted_str_, "'",
-                              dim->value_str(dim->value_str_size() - 1) + "' ");
+    if (tensor_->dims() == 0 && values.size() == 1) {
+      std::ostringstream sstream;
+      sstream << values[nstart];
+
+      if (typeid(values[nstart]) == typeid(double)) {
+        double double_val;
+        CHECK(strings::safe_strtod(sstream.str().c_str(), &double_val));
+        dim->add_value_double(double_val);
+        formatted_str_ += strings::Printf(
+            "%.2f ", dim->value_double(dim->value_double_size() - 1));
+      } else if (typeid(values[nstart]) == typeid(int64)) {
+        int64 int64_val;
+        CHECK(strings::safe_strto64(sstream.str().c_str(), &int64_val));
+        dim->add_value_int64(int64_val);
+        formatted_str_ += strings::Printf(
+            "%lld ",
+            static_cast<int64>(dim->value_int64(dim->value_int64_size() - 1)));
+      } else if (typeid(values[nstart]) == typeid(string)) {
+        dim->add_value_str(sstream.str());
+        formatted_str_ =
+            strings::StrCat(formatted_str_, "'",
+                            dim->value_str(dim->value_str_size() - 1) + "' ");
+      } else {
+        CHECK(false) << "Unsupported type: " << typeid(values[nstart]).name();
+      }
+    } else {
+      for (int i = 0; i < tensor_->dim_size(depth); i++) {
+        // Last dimension, pull the values.
+        if (depth == tensor_->dims() - 1) {
+          std::ostringstream sstream;
+          sstream << values[nstart];
+
+          if (typeid(values[nstart]) == typeid(double)) {
+            double double_val;
+            CHECK(strings::safe_strtod(sstream.str().c_str(), &double_val));
+            dim->add_value_double(double_val);
+            formatted_str_ += strings::Printf(
+                "%.2f ", dim->value_double(dim->value_double_size() - 1));
+          } else if (typeid(values[nstart]) == typeid(int64)) {
+            int64 int64_val;
+            CHECK(strings::safe_strto64(sstream.str().c_str(), &int64_val));
+            dim->add_value_int64(int64_val);
+            formatted_str_ += strings::Printf(
+                "%lld ", static_cast<int64>(
+                             dim->value_int64(dim->value_int64_size() - 1)));
+          } else if (typeid(values[nstart]) == typeid(string)) {
+            dim->add_value_str(sstream.str());
+            formatted_str_ = strings::StrCat(
+                formatted_str_, "'",
+                dim->value_str(dim->value_str_size() - 1) + "' ");
+          } else {
+            CHECK(false) << "Unsupported type: "
+                         << typeid(values[nstart]).name();
+          }
+          ++nstart;
         } else {
-          CHECK(false) << "Unsupported type: " << typeid(values[nstart]).name();
+          // Not-last dimension. Drill deeper.
+          nstart = BuildOutput<T>(nstart, depth + 1, values, dim);
         }
-        ++nstart;
-      } else {
-        // Not-last dimension. Drill deeper.
-        nstart = BuildOutput<T>(nstart, depth + 1, values, dim);
       }
     }
     if (formatted_str_.length() > kTFProfTenosrMaxDisplayLen) {
diff --git a/tensorflow/core/profiler/internal/tfprof_tensor_test.cc b/tensorflow/core/profiler/internal/tfprof_tensor_test.cc
index 50ef82abc9..c68888e88f 100644
--- a/tensorflow/core/profiler/internal/tfprof_tensor_test.cc
+++ b/tensorflow/core/profiler/internal/tfprof_tensor_test.cc
@@ -18,12 +18,12 @@ limitations under the License.
 #include "tensorflow/core/lib/io/path.h"
 #include "tensorflow/core/platform/protobuf.h"
 #include "tensorflow/core/platform/test.h"
-#include "tensorflow/core/protobuf/config.pb.h"
 #include "tensorflow/core/profiler/internal/tfprof_options.h"
 #include "tensorflow/core/profiler/internal/tfprof_stats.h"
 #include "tensorflow/core/profiler/internal/tfprof_utils.h"
 #include "tensorflow/core/profiler/tfprof_log.pb.h"
 #include "tensorflow/core/profiler/tfprof_output.pb.h"
+#include "tensorflow/core/protobuf/config.pb.h"
 
 namespace tensorflow {
 namespace tfprof {
@@ -57,244 +57,19 @@ class TFProfTensorTest : public ::testing::Test {
 };
 
 TEST_F(TFProfTensorTest, Basics) {
-  Options opts(3, 0, 0, 0, 0, 0, -1, "name", {"VariableV2"}, {".*"}, {""},
-               {".*"}, {""}, false, {"tensor_value"},  // show the tensor value.
+  Options opts(3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, "name", {"VariableV2"},
+               {".*"}, {""}, {".*"}, {""}, false,
+               {"tensor_value"},  // show the tensor value.
                "", {});
   const GraphNodeProto& root = tf_stats_->ShowGraphNode("scope", opts);
 
   GraphNodeProto expected;
-  CHECK(protobuf::TextFormat::ParseFromString(
-      "name: \"_TFProfRoot\"\nexec_micros: 0\nrequested_bytes: "
-      "0\ntotal_exec_micros: 0\ntotal_requested_bytes: 0\ntotal_parameters: "
-      "370\nchildren {\n  name: \"conv2d\"\n  exec_micros: 0\n  "
-      "requested_bytes: 0\n  total_exec_micros: 0\n  total_requested_bytes: "
-      "0\n  total_parameters: 140\n  children {\n    name: \"conv2d/bias\"\n   "
-      " exec_micros: 0\n    requested_bytes: 0\n    parameters: 5\n    "
-      "total_exec_micros: 0\n    total_requested_bytes: 0\n    "
-      "total_parameters: 5\n    float_ops: 0\n    total_float_ops: 0\n    "
-      "tensor_value {\n      dtype: DT_FLOAT\n      value_double: 0\n      "
-      "value_double: 0\n      value_double: 0\n      value_double: 0\n      "
-      "value_double: 0\n    }\n    accelerator_exec_micros: 0\n    "
-      "cpu_exec_micros: 0\n    total_accelerator_exec_micros: 0\n    "
-      "total_cpu_exec_micros: 0\n    run_count: 0\n    total_run_count: 0\n    "
-      "total_definition_count: 1\n  }\n  children {\n    name: "
-      "\"conv2d/kernel\"\n    exec_micros: 0\n    requested_bytes: 0\n    "
-      "parameters: 135\n    total_exec_micros: 0\n    total_requested_bytes: "
-      "0\n    total_parameters: 135\n    float_ops: 0\n    total_float_ops: "
-      "0\n    tensor_value {\n      dtype: DT_FLOAT\n      value_double: "
-      "-0.113138\n      value_double: 0.261431\n      value_double: 0.215777\n "
-      "     value_double: 0.24135\n      value_double: -0.113195\n      "
-      "value_double: -0.212639\n      value_double: -0.0907301\n      "
-      "value_double: 0.0221634\n      value_double: 0.21821\n      "
-      "value_double: 0.22715\n      value_double: -0.108698\n      "
-      "value_double: 0.240911\n      value_double: -0.138626\n      "
-      "value_double: -0.144752\n      value_double: -0.00962037\n      "
-      "value_double: 0.0971008\n      value_double: 0.00264764\n      "
-      "value_double: -0.272929\n      value_double: 0.0129845\n      "
-      "value_double: 0.0466554\n      value_double: -0.229184\n      "
-      "value_double: 0.153576\n      value_double: -0.169218\n      "
-      "value_double: -0.112991\n      value_double: 0.205739\n      "
-      "value_double: 0.257844\n      value_double: 0.107455\n      "
-      "value_double: -0.207914\n      value_double: 0.15211\n      "
-      "value_double: 0.277932\n      value_double: 0.145986\n      "
-      "value_double: -0.0883989\n      value_double: 0.167506\n      "
-      "value_double: 0.10237\n      value_double: 0.0542143\n      "
-      "value_double: 0.0334378\n      value_double: 0.159489\n      "
-      "value_double: 0.246583\n      value_double: 0.0154283\n      "
-      "value_double: 0.0872411\n      value_double: -0.25732\n      "
-      "value_double: 0.0499355\n      value_double: 0.0266221\n      "
-      "value_double: 0.088801\n      value_double: -0.0794552\n      "
-      "value_double: -0.00383255\n      value_double: -0.165267\n      "
-      "value_double: 0.0271328\n      value_double: 0.0729822\n      "
-      "value_double: 0.200795\n      value_double: 0.100276\n      "
-      "value_double: 0.285254\n      value_double: -0.171945\n      "
-      "value_double: -0.0187411\n      value_double: -0.218729\n      "
-      "value_double: 0.233753\n      value_double: 0.109184\n      "
-      "value_double: 0.247875\n      value_double: -0.224632\n      "
-      "value_double: 0.0940739\n      value_double: 0.00663087\n      "
-      "value_double: -0.075786\n      value_double: -0.179992\n      "
-      "value_double: -0.276016\n      value_double: 0.261207\n      "
-      "value_double: -0.0658191\n      value_double: -0.0747132\n      "
-      "value_double: -0.0839638\n      value_double: -0.0825393\n      "
-      "value_double: 0.0915958\n      value_double: -0.195425\n      "
-      "value_double: -0.255836\n      value_double: -0.08745\n      "
-      "value_double: -0.181623\n      value_double: -0.235936\n      "
-      "value_double: 0.0205423\n      value_double: 0.185447\n      "
-      "value_double: -0.0691599\n      value_double: -0.0451089\n      "
-      "value_double: -0.153922\n      value_double: -0.0279411\n      "
-      "value_double: 0.148915\n      value_double: -0.018026\n      "
-      "value_double: -0.144903\n      value_double: 0.0370046\n      "
-      "value_double: 0.0764987\n      value_double: 0.0586488\n      "
-      "value_double: -0.222919\n      value_double: 0.0238447\n      "
-      "value_double: -0.106012\n      value_double: -0.102202\n      "
-      "value_double: -0.159347\n      value_double: -0.0232876\n      "
-      "value_double: 0.109855\n      value_double: -0.141833\n      "
-      "value_double: 0.1376\n      value_double: -0.12413\n      value_double: "
-      "-0.208968\n      value_double: 0.0758635\n      value_double: "
-      "-0.217672\n      value_double: -0.20153\n      value_double: "
-      "-0.195414\n      value_double: -0.18549\n      value_double: "
-      "0.00298014\n      value_double: -0.279283\n      value_double: "
-      "0.200084\n      value_double: -0.0968328\n      value_double: -0.243\n  "
-      "    value_double: 0.239319\n      value_double: -0.236288\n      "
-      "value_double: 0.169477\n      value_double: 0.126673\n      "
-      "value_double: 0.182215\n      value_double: -0.028243\n      "
-      "value_double: 0.282762\n      value_double: -0.165548\n      "
-      "value_double: -0.0641245\n      value_double: -0.186382\n      "
-      "value_double: 0.0329038\n      value_double: 0.271848\n      "
-      "value_double: 0.084653\n      value_double: -0.108163\n      "
-      "value_double: 0.247094\n      value_double: 0.192687\n      "
-      "value_double: 0.171922\n      value_double: -0.187649\n      "
-      "value_double: 0.251253\n      value_double: 0.272077\n      "
-      "value_double: 0.19068\n      value_double: 0.220352\n      "
-      "value_double: -0.255741\n      value_double: 0.110853\n      "
-      "value_double: 0.146625\n      value_double: 0.167754\n      "
-      "value_double: 0.249554\n    }\n    accelerator_exec_micros: 0\n    "
-      "cpu_exec_micros: 0\n    total_accelerator_exec_micros: 0\n    "
-      "total_cpu_exec_micros: 0\n    run_count: 0\n    total_run_count: 0\n    "
-      "total_definition_count: 1\n  }\n  float_ops: 0\n  total_float_ops: 0\n  "
-      "accelerator_exec_micros: 0\n  cpu_exec_micros: 0\n  "
-      "total_accelerator_exec_micros: 0\n  total_cpu_exec_micros: 0\n  "
-      "run_count: 0\n  total_run_count: 0\n  total_definition_count: "
-      "3\n}\nchildren {\n  name: \"conv2d_1\"\n  exec_micros: 0\n  "
-      "requested_bytes: 0\n  total_exec_micros: 0\n  total_requested_bytes: "
-      "0\n  total_parameters: 230\n  children {\n    name: \"conv2d_1/bias\"\n "
-      "   exec_micros: 0\n    requested_bytes: 0\n    parameters: 5\n    "
-      "total_exec_micros: 0\n    total_requested_bytes: 0\n    "
-      "total_parameters: 5\n    float_ops: 0\n    total_float_ops: 0\n    "
-      "tensor_value {\n      dtype: DT_FLOAT\n      value_double: 0\n      "
-      "value_double: 0\n      value_double: 0\n      value_double: 0\n      "
-      "value_double: 0\n    }\n    accelerator_exec_micros: 0\n    "
-      "cpu_exec_micros: 0\n    total_accelerator_exec_micros: 0\n    "
-      "total_cpu_exec_micros: 0\n    run_count: 0\n    total_run_count: 0\n    "
-      "total_definition_count: 1\n  }\n  children {\n    name: "
-      "\"conv2d_1/kernel\"\n    exec_micros: 0\n    requested_bytes: 0\n    "
-      "parameters: 225\n    total_exec_micros: 0\n    total_requested_bytes: "
-      "0\n    total_parameters: 225\n    float_ops: 0\n    total_float_ops: "
-      "0\n    tensor_value {\n      dtype: DT_FLOAT\n      value_double: "
-      "-0.00170514\n      value_double: 0.138601\n      value_double: "
-      "-0.224822\n      value_double: -0.0848449\n      value_double: "
-      "0.170551\n      value_double: 0.147666\n      value_double: "
-      "-0.0570606\n      value_double: -0.132805\n      value_double: "
-      "-0.172013\n      value_double: 0.249707\n      value_double: 0.149734\n "
-      "     value_double: 0.0365986\n      value_double: -0.0923146\n      "
-      "value_double: -0.17745\n      value_double: -0.169978\n      "
-      "value_double: -0.173298\n      value_double: -0.110407\n      "
-      "value_double: 0.1469\n      value_double: 0.0419576\n      "
-      "value_double: 0.0391093\n      value_double: -0.137381\n      "
-      "value_double: 0.212642\n      value_double: -0.067034\n      "
-      "value_double: -0.0727709\n      value_double: -0.0276531\n      "
-      "value_double: 0.218212\n      value_double: 0.0596479\n      "
-      "value_double: -0.0468102\n      value_double: -0.0250467\n      "
-      "value_double: -0.20391\n      value_double: -0.233801\n      "
-      "value_double: 0.135615\n      value_double: -0.182124\n      "
-      "value_double: 0.254205\n      value_double: 0.0819146\n      "
-      "value_double: -0.146696\n      value_double: -0.20095\n      "
-      "value_double: -0.250555\n      value_double: -0.226406\n      "
-      "value_double: 0.0421331\n      value_double: 0.0361264\n      "
-      "value_double: -0.188558\n      value_double: -0.0222711\n      "
-      "value_double: -0.128226\n      value_double: -0.148305\n      "
-      "value_double: -0.137598\n      value_double: -0.041647\n      "
-      "value_double: -0.0574933\n      value_double: 0.122506\n      "
-      "value_double: 0.0415936\n      value_double: 0.244957\n      "
-      "value_double: 0.00372121\n      value_double: -0.139939\n      "
-      "value_double: 0.250411\n      value_double: -0.23848\n      "
-      "value_double: -0.0717569\n      value_double: -0.00884159\n      "
-      "value_double: 0.135616\n      value_double: -0.0493895\n      "
-      "value_double: 0.254308\n      value_double: -0.181419\n      "
-      "value_double: -0.114829\n      value_double: -0.172638\n      "
-      "value_double: 0.06984\n      value_double: -0.086704\n      "
-      "value_double: 0.168515\n      value_double: -0.152275\n      "
-      "value_double: -0.230775\n      value_double: -0.254366\n      "
-      "value_double: -0.115397\n      value_double: 0.0418207\n      "
-      "value_double: -0.199607\n      value_double: -0.167001\n      "
-      "value_double: -0.187238\n      value_double: 0.0196097\n      "
-      "value_double: 0.201653\n      value_double: -0.143758\n      "
-      "value_double: 0.167187\n      value_double: -0.129141\n      "
-      "value_double: 0.230154\n      value_double: -0.119968\n      "
-      "value_double: -0.121843\n      value_double: -0.0118565\n      "
-      "value_double: 0.0285747\n      value_double: -0.0593699\n      "
-      "value_double: -0.175214\n      value_double: -0.211524\n      "
-      "value_double: 0.167042\n      value_double: -0.216357\n      "
-      "value_double: -0.0218886\n      value_double: -0.244211\n      "
-      "value_double: 0.175301\n      value_double: 0.0654932\n      "
-      "value_double: -0.0419763\n      value_double: -0.103275\n      "
-      "value_double: -0.0848433\n      value_double: -0.0845421\n      "
-      "value_double: -0.00269318\n      value_double: -0.145978\n      "
-      "value_double: -0.217061\n      value_double: -0.0937043\n      "
-      "value_double: 0.235796\n      value_double: -0.0893372\n      "
-      "value_double: 0.000827968\n      value_double: 0.0172743\n      "
-      "value_double: -0.234205\n      value_double: -0.0867703\n      "
-      "value_double: 0.131704\n      value_double: 0.134143\n      "
-      "value_double: -0.162257\n      value_double: -0.129706\n      "
-      "value_double: 0.0763288\n      value_double: 0.156988\n      "
-      "value_double: 0.220033\n      value_double: -0.179884\n      "
-      "value_double: 0.066697\n      value_double: 0.212322\n      "
-      "value_double: -0.0961226\n      value_double: -0.11223\n      "
-      "value_double: 0.249944\n      value_double: 0.115673\n      "
-      "value_double: -0.100203\n      value_double: 0.125645\n      "
-      "value_double: -0.256104\n      value_double: 0.0996534\n      "
-      "value_double: 0.167306\n      value_double: -0.00700775\n      "
-      "value_double: 0.242145\n      value_double: 0.088406\n      "
-      "value_double: 0.0975334\n      value_double: -0.0309525\n      "
-      "value_double: -0.0422794\n      value_double: 0.20739\n      "
-      "value_double: 0.113992\n      value_double: 0.253818\n      "
-      "value_double: -0.0857835\n      value_double: 0.223902\n      "
-      "value_double: 0.10291\n      value_double: 0.103091\n      "
-      "value_double: -0.177502\n      value_double: -0.0258242\n      "
-      "value_double: -0.130567\n      value_double: -0.15999\n      "
-      "value_double: -0.101484\n      value_double: 0.0188813\n      "
-      "value_double: 0.160626\n      value_double: 0.0467491\n      "
-      "value_double: 0.193634\n      value_double: -0.0910993\n      "
-      "value_double: 0.0440249\n      value_double: -0.255389\n      "
-      "value_double: -0.240244\n      value_double: -0.213171\n      "
-      "value_double: 0.175978\n      value_double: -0.0251202\n      "
-      "value_double: 0.0943941\n      value_double: -0.196194\n      "
-      "value_double: 0.163395\n      value_double: -0.010777\n      "
-      "value_double: -0.0626751\n      value_double: -0.246234\n      "
-      "value_double: 0.0662063\n      value_double: 0.120589\n      "
-      "value_double: 0.237322\n      value_double: 0.0849243\n      "
-      "value_double: -0.066591\n      value_double: 0.0512236\n      "
-      "value_double: -0.144309\n      value_double: -0.235415\n      "
-      "value_double: -0.0565311\n      value_double: 0.0882529\n      "
-      "value_double: -0.215923\n      value_double: -0.0873292\n      "
-      "value_double: -0.0691103\n      value_double: -0.00238678\n      "
-      "value_double: 0.147789\n      value_double: -0.124451\n      "
-      "value_double: 0.205044\n      value_double: -0.0596834\n      "
-      "value_double: 0.0268479\n      value_double: 0.0857448\n      "
-      "value_double: -0.0923855\n      value_double: -0.0960547\n      "
-      "value_double: 0.169869\n      value_double: 0.16988\n      "
-      "value_double: -0.032271\n      value_double: -0.120731\n      "
-      "value_double: -0.199086\n      value_double: 0.181199\n      "
-      "value_double: 0.00897732\n      value_double: -0.257469\n      "
-      "value_double: -0.135556\n      value_double: -0.149663\n      "
-      "value_double: -0.00990398\n      value_double: 0.221165\n      "
-      "value_double: 0.0327134\n      value_double: -0.0392821\n      "
-      "value_double: -0.0614503\n      value_double: 0.246602\n      "
-      "value_double: -0.171692\n      value_double: -0.150835\n      "
-      "value_double: -0.13854\n      value_double: -0.244668\n      "
-      "value_double: 0.0790781\n      value_double: 0.212678\n      "
-      "value_double: 0.0782059\n      value_double: -0.177888\n      "
-      "value_double: -0.165914\n      value_double: -0.164251\n      "
-      "value_double: 0.165007\n      value_double: 0.239615\n      "
-      "value_double: -0.217642\n      value_double: -0.219843\n      "
-      "value_double: 0.0828398\n      value_double: 0.00272235\n      "
-      "value_double: -0.0323662\n      value_double: -0.255953\n      "
-      "value_double: 0.237298\n      value_double: -0.0896481\n      "
-      "value_double: -0.0605349\n      value_double: 0.231679\n      "
-      "value_double: -0.123842\n      value_double: 0.0858642\n      "
-      "value_double: 0.23111\n      value_double: 0.0491742\n    }\n    "
-      "accelerator_exec_micros: 0\n    cpu_exec_micros: 0\n    "
-      "total_accelerator_exec_micros: 0\n    total_cpu_exec_micros: 0\n    "
-      "run_count: 0\n    total_run_count: 0\n    total_definition_count: 1\n  "
-      "}\n  float_ops: 0\n  total_float_ops: 0\n  accelerator_exec_micros: 0\n "
-      " cpu_exec_micros: 0\n  total_accelerator_exec_micros: 0\n  "
-      "total_cpu_exec_micros: 0\n  run_count: 0\n  total_run_count: 0\n  "
-      "total_definition_count: 3\n}\nfloat_ops: 0\ntotal_float_ops: "
-      "0\naccelerator_exec_micros: 0\ncpu_exec_micros: "
-      "0\ntotal_accelerator_exec_micros: 0\ntotal_cpu_exec_micros: "
-      "0\nrun_count: 0\ntotal_run_count: 0\ntotal_definition_count: 6\n",
-      &expected));
-  EXPECT_EQ(expected.DebugString(), root.DebugString());
+  EXPECT_EQ(root.children(0).name(), "DW");
+  EXPECT_GT(root.children(0).tensor_value().value_double_size(), 10);
+  EXPECT_EQ(root.children(1).name(), "DW2");
+  EXPECT_GT(root.children(1).tensor_value().value_double_size(), 10);
+  EXPECT_EQ(root.children(2).name(), "ScalarW");
+  EXPECT_EQ(root.children(2).tensor_value().value_double_size(), 1);
 }
 
 }  // namespace tfprof
diff --git a/tensorflow/core/profiler/internal/tfprof_timeline.cc b/tensorflow/core/profiler/internal/tfprof_timeline.cc
index cfd80b875a..f3934860d9 100644
--- a/tensorflow/core/profiler/internal/tfprof_timeline.cc
+++ b/tensorflow/core/profiler/internal/tfprof_timeline.cc
@@ -147,8 +147,8 @@ void MemoryTracker::TrackNodeConnection(int64 step, const GraphNode* node,
   if (output_idx == node->node->src_output_idx().end()) {
     return;
   }
-  const auto& output = src->node->output_bytes(step).find(output_idx->second);
-  if (output == src->node->output_bytes(step).end()) {
+  const auto& output = src->node->output_memory(step).find(output_idx->second);
+  if (output == src->node->output_memory(step).end()) {
     return;
   }
   int64 output_bytes = output->second.first;
diff --git a/tensorflow/core/profiler/internal/tfprof_timeline_test.cc b/tensorflow/core/profiler/internal/tfprof_timeline_test.cc
index 6842f262c6..2fe3653ec2 100644
--- a/tensorflow/core/profiler/internal/tfprof_timeline_test.cc
+++ b/tensorflow/core/profiler/internal/tfprof_timeline_test.cc
@@ -62,7 +62,8 @@ class TFProfTimelineTest : public ::testing::Test {
 // manually check it's correct
 TEST_F(TFProfTimelineTest, GraphView) {
   string dump_file = io::JoinPath(testing::TmpDir(), "dump");
-  Options opts(10000, 0, 0, 0, 0, 0, 0, "name", {".*"},  // accout_type_regexes
+  Options opts(10000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, "name",
+               {".*"},  // accout_type_regexes
                {".*"}, {""}, {".*"}, {""}, false,
                {"params", "bytes", "micros", "float_ops"}, "timeline",
                {{"outfile", dump_file}});
@@ -70,12 +71,13 @@ TEST_F(TFProfTimelineTest, GraphView) {
 
   string dump_str;
   TF_CHECK_OK(ReadFileToString(Env::Default(), dump_file, &dump_str));
-  EXPECT_EQ(5576767607271035974ull, Hash64(dump_str));
+  EXPECT_EQ(16947107375505024864ull, Hash64(dump_str));
 }
 
 TEST_F(TFProfTimelineTest, ScopeView) {
   string dump_file = io::JoinPath(testing::TmpDir(), "dump");
-  Options opts(5, 0, 0, 0, 0, 0, 0, "name", {".*"},  // accout_type_regexes
+  Options opts(5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, "name",
+               {".*"},  // accout_type_regexes
                {".*"}, {""}, {".*"}, {""}, false,
                {"params", "bytes", "micros", "float_ops"}, "timeline",
                {{"outfile", dump_file}});
@@ -83,7 +85,7 @@ TEST_F(TFProfTimelineTest, ScopeView) {
 
   string dump_str;
   TF_CHECK_OK(ReadFileToString(Env::Default(), dump_file, &dump_str));
-  EXPECT_EQ(10135186027625211652ull, Hash64(dump_str));
+  EXPECT_EQ(2710044785377031280ull, Hash64(dump_str));
 }
 
 // TODO(xpan): tfprof_log is too large to include in testdata when adding
diff --git a/tensorflow/core/profiler/internal/tfprof_utils.cc b/tensorflow/core/profiler/internal/tfprof_utils.cc
index 464a13f7df..383c4725b7 100644
--- a/tensorflow/core/profiler/internal/tfprof_utils.cc
+++ b/tensorflow/core/profiler/internal/tfprof_utils.cc
@@ -140,35 +140,66 @@ tensorflow::Status ParseCmdLine(const string& line, string* cmd,
       ++i;
     } else if (pieces[i] == tensorflow::tfprof::kOptions[2]) {
       if (pieces.size() <= i + 1 ||
-          !strings::safe_strto64(pieces[i + 1], &opts->min_micros)) {
+          !strings::safe_strto64(pieces[i + 1], &opts->min_peak_bytes)) {
         return ReturnError(pieces, i);
       }
       ++i;
     } else if (pieces[i] == tensorflow::tfprof::kOptions[3]) {
       if (pieces.size() <= i + 1 ||
-          !strings::safe_strto64(pieces[i + 1], &opts->min_params)) {
+          !strings::safe_strto64(pieces[i + 1], &opts->min_residual_bytes)) {
         return ReturnError(pieces, i);
       }
       ++i;
     } else if (pieces[i] == tensorflow::tfprof::kOptions[4]) {
       if (pieces.size() <= i + 1 ||
-          !strings::safe_strto64(pieces[i + 1], &opts->min_float_ops)) {
+          !strings::safe_strto64(pieces[i + 1], &opts->min_output_bytes)) {
         return ReturnError(pieces, i);
       }
       ++i;
     } else if (pieces[i] == tensorflow::tfprof::kOptions[5]) {
       if (pieces.size() <= i + 1 ||
-          !strings::safe_strto64(pieces[i + 1], &opts->min_occurrence)) {
+          !strings::safe_strto64(pieces[i + 1], &opts->min_micros)) {
         return ReturnError(pieces, i);
       }
       ++i;
     } else if (pieces[i] == tensorflow::tfprof::kOptions[6]) {
       if (pieces.size() <= i + 1 ||
-          !strings::safe_strto64(pieces[i + 1], &opts->step)) {
+          !strings::safe_strto64(pieces[i + 1],
+                                 &opts->min_accelerator_micros)) {
         return ReturnError(pieces, i);
       }
       ++i;
     } else if (pieces[i] == tensorflow::tfprof::kOptions[7]) {
+      if (pieces.size() <= i + 1 ||
+          !strings::safe_strto64(pieces[i + 1], &opts->min_cpu_micros)) {
+        return ReturnError(pieces, i);
+      }
+      ++i;
+    } else if (pieces[i] == tensorflow::tfprof::kOptions[8]) {
+      if (pieces.size() <= i + 1 ||
+          !strings::safe_strto64(pieces[i + 1], &opts->min_params)) {
+        return ReturnError(pieces, i);
+      }
+      ++i;
+    } else if (pieces[i] == tensorflow::tfprof::kOptions[9]) {
+      if (pieces.size() <= i + 1 ||
+          !strings::safe_strto64(pieces[i + 1], &opts->min_float_ops)) {
+        return ReturnError(pieces, i);
+      }
+      ++i;
+    } else if (pieces[i] == tensorflow::tfprof::kOptions[10]) {
+      if (pieces.size() <= i + 1 ||
+          !strings::safe_strto64(pieces[i + 1], &opts->min_occurrence)) {
+        return ReturnError(pieces, i);
+      }
+      ++i;
+    } else if (pieces[i] == tensorflow::tfprof::kOptions[11]) {
+      if (pieces.size() <= i + 1 ||
+          !strings::safe_strto64(pieces[i + 1], &opts->step)) {
+        return ReturnError(pieces, i);
+      }
+      ++i;
+    } else if (pieces[i] == tensorflow::tfprof::kOptions[12]) {
       if (pieces.size() <= i + 1) {
         return ReturnError(pieces, i);
       }
@@ -180,42 +211,42 @@ tensorflow::Status ParseCmdLine(const string& line, string* cmd,
       }
       opts->order_by = *order_by;
       ++i;
-    } else if (pieces[i] == tensorflow::tfprof::kOptions[8]) {
+    } else if (pieces[i] == tensorflow::tfprof::kOptions[13]) {
       if (pieces.size() <= i + 1) {
         return ReturnError(pieces, i);
       }
       opts->account_type_regexes = str_util::Split(StripQuote(pieces[i + 1]),
                                                    ',', str_util::SkipEmpty());
       ++i;
-    } else if (pieces[i] == tensorflow::tfprof::kOptions[9]) {
+    } else if (pieces[i] == tensorflow::tfprof::kOptions[14]) {
       if (pieces.size() <= i + 1) {
         return ReturnError(pieces, i);
       }
       opts->start_name_regexes = str_util::Split(StripQuote(pieces[i + 1]), ',',
                                                  str_util::SkipEmpty());
       ++i;
-    } else if (pieces[i] == tensorflow::tfprof::kOptions[10]) {
+    } else if (pieces[i] == tensorflow::tfprof::kOptions[15]) {
       if (pieces.size() <= i + 1) {
         return ReturnError(pieces, i);
       }
       opts->trim_name_regexes = str_util::Split(StripQuote(pieces[i + 1]), ',',
                                                 str_util::SkipEmpty());
       ++i;
-    } else if (pieces[i] == tensorflow::tfprof::kOptions[11]) {
+    } else if (pieces[i] == tensorflow::tfprof::kOptions[16]) {
       if (pieces.size() <= i + 1) {
         return ReturnError(pieces, i);
       }
       opts->show_name_regexes = str_util::Split(StripQuote(pieces[i + 1]), ',',
                                                 str_util::SkipEmpty());
       ++i;
-    } else if (pieces[i] == tensorflow::tfprof::kOptions[12]) {
+    } else if (pieces[i] == tensorflow::tfprof::kOptions[17]) {
       if (pieces.size() <= i + 1) {
         return ReturnError(pieces, i);
       }
       opts->hide_name_regexes = str_util::Split(StripQuote(pieces[i + 1]), ',',
                                                 str_util::SkipEmpty());
       ++i;
-    } else if (pieces[i] == tensorflow::tfprof::kOptions[13]) {
+    } else if (pieces[i] == tensorflow::tfprof::kOptions[18]) {
       if ((pieces.size() > i + 1 && pieces[i + 1].find("-") == 0) ||
           pieces.size() == i + 1) {
         opts->account_displayed_op_only = true;
@@ -225,7 +256,7 @@ tensorflow::Status ParseCmdLine(const string& line, string* cmd,
       } else {
         ++i;
       }
-    } else if (pieces[i] == tensorflow::tfprof::kOptions[14]) {
+    } else if (pieces[i] == tensorflow::tfprof::kOptions[19]) {
       if (pieces.size() <= i + 1) {
         return ReturnError(pieces, i);
       }
@@ -242,7 +273,7 @@ tensorflow::Status ParseCmdLine(const string& line, string* cmd,
       }
       opts->select = requested_set;
       ++i;
-    } else if (pieces[i] == tensorflow::tfprof::kOptions[15]) {
+    } else if (pieces[i] == tensorflow::tfprof::kOptions[20]) {
       if (pieces.size() <= i + 1) {
         return ReturnError(pieces, i);
       }
diff --git a/tensorflow/core/profiler/profiler.cc b/tensorflow/core/profiler/profiler.cc
index ade478367e..6acf4ea377 100644
--- a/tensorflow/core/profiler/profiler.cc
+++ b/tensorflow/core/profiler/profiler.cc
@@ -72,7 +72,12 @@ int Run(int argc, char** argv) {
   string FLAGS_checkpoint_path = "";
   int32 FLAGS_max_depth = 10;
   int64 FLAGS_min_bytes = 0;
+  int64 FLAGS_min_peak_bytes = 0;
+  int64 FLAGS_min_residual_bytes = 0;
+  int64 FLAGS_min_output_bytes = 0;
   int64 FLAGS_min_micros = 0;
+  int64 FLAGS_min_accelerator_micros = 0;
+  int64 FLAGS_min_cpu_micros = 0;
   int64 FLAGS_min_params = 0;
   int64 FLAGS_min_float_ops = 0;
   int64 FLAGS_min_occurrence = 0;
@@ -101,7 +106,14 @@ int Run(int argc, char** argv) {
            "TensorFlow Checkpoint file name"),
       Flag("max_depth", &FLAGS_max_depth, "max depth"),
       Flag("min_bytes", &FLAGS_min_bytes, "min_bytes"),
+      Flag("min_peak_bytes", &FLAGS_min_peak_bytes, "min_peak_bytes"),
+      Flag("min_residual_bytes", &FLAGS_min_residual_bytes,
+           "min_residual_bytes"),
+      Flag("min_output_bytes", &FLAGS_min_output_bytes, "min_output_bytes"),
       Flag("min_micros", &FLAGS_min_micros, "min micros"),
+      Flag("min_accelerator_micros", &FLAGS_min_accelerator_micros,
+           "min acclerator_micros"),
+      Flag("min_cpu_micros", &FLAGS_min_cpu_micros, "min_cpu_micros"),
       Flag("min_params", &FLAGS_min_params, "min params"),
       Flag("min_float_ops", &FLAGS_min_float_ops, "min float ops"),
       Flag("min_occurrence", &FLAGS_min_occurrence, "min occurrence"),
@@ -214,12 +226,14 @@ int Run(int argc, char** argv) {
     return 0;
   }
 
-  Options opts(FLAGS_max_depth, FLAGS_min_bytes, FLAGS_min_micros,
-               FLAGS_min_params, FLAGS_min_float_ops, FLAGS_min_occurrence,
-               FLAGS_step, FLAGS_order_by, account_type_regexes,
-               start_name_regexes, trim_name_regexes, show_name_regexes,
-               hide_name_regexes, FLAGS_account_displayed_op_only, select,
-               output_type, output_options);
+  Options opts(
+      FLAGS_max_depth, FLAGS_min_bytes, FLAGS_min_peak_bytes,
+      FLAGS_min_residual_bytes, FLAGS_min_output_bytes, FLAGS_min_micros,
+      FLAGS_min_accelerator_micros, FLAGS_min_cpu_micros, FLAGS_min_params,
+      FLAGS_min_float_ops, FLAGS_min_occurrence, FLAGS_step, FLAGS_order_by,
+      account_type_regexes, start_name_regexes, trim_name_regexes,
+      show_name_regexes, hide_name_regexes, FLAGS_account_displayed_op_only,
+      select, output_type, output_options);
 
   if (cmd == kCmds[2] || cmd == kCmds[3]) {
     tf_stat.BuildView(cmd);
diff --git a/tensorflow/core/profiler/tfprof_options.proto b/tensorflow/core/profiler/tfprof_options.proto
index 5882833039..b53288d351 100644
--- a/tensorflow/core/profiler/tfprof_options.proto
+++ b/tensorflow/core/profiler/tfprof_options.proto
@@ -7,7 +7,12 @@ package tensorflow.tfprof;
 message OptionsProto {
   int64 max_depth = 1;
   int64 min_bytes = 2;
+  int64 min_peak_bytes = 19;
+  int64 min_residual_bytes = 20;
+  int64 min_output_bytes = 21;
   int64 min_micros = 3;
+  int64 min_accelerator_micros = 22;
+  int64 min_cpu_micros = 23;
   int64 min_params = 4;
   int64 min_float_ops = 5;
   int64 min_occurrence = 17;
diff --git a/tensorflow/core/profiler/tfprof_output.proto b/tensorflow/core/profiler/tfprof_output.proto
index 5c9f132243..4a6068da40 100644
--- a/tensorflow/core/profiler/tfprof_output.proto
+++ b/tensorflow/core/profiler/tfprof_output.proto
@@ -28,8 +28,15 @@ message GraphNodeProto {
   int64 accelerator_exec_micros = 17;
   int64 cpu_exec_micros = 18;
 
-  // Total requested bytes by the op.
+  // Total bytes requested by the op.
   int64 requested_bytes = 3;
+  // Max bytes allocated and being used by the op at a point.
+  int64 peak_bytes = 24;
+  // Total bytes requested by the op and not released before end.
+  int64 residual_bytes = 25;
+  // Total bytes output by the op (not necessarily allocated by the op).
+  int64 output_bytes = 26;
+
   // Number of parameters if available.
   int64 parameters = 4;
   // Number of float operations.
@@ -49,6 +56,10 @@ message GraphNodeProto {
   int64 total_cpu_exec_micros = 20;
 
   int64 total_requested_bytes = 7;
+  int64 total_peak_bytes = 27;
+  int64 total_residual_bytes = 28;
+  int64 total_output_bytes = 29;
+
   int64 total_parameters = 8;
   int64 total_float_ops = 14;
 
@@ -81,6 +92,13 @@ message MultiGraphNodeProto {
 
   // Total requested bytes by the code.
   int64 requested_bytes = 3;
+  // Max bytes allocated and being used by the op at a point.
+  int64 peak_bytes = 16;
+  // Total bytes requested by the op and not released before end.
+  int64 residual_bytes = 17;
+  // Total bytes output by the op (not necessarily allocated by the op).
+  int64 output_bytes = 18;
+
   // Number of parameters if available.
   int64 parameters = 4;
   // Number of float operations.
@@ -93,6 +111,10 @@ message MultiGraphNodeProto {
   int64 total_cpu_exec_micros = 15;
 
   int64 total_requested_bytes = 7;
+  int64 total_peak_bytes = 19;
+  int64 total_residual_bytes = 20;
+  int64 total_output_bytes = 21;
+
   int64 total_parameters = 8;
   int64 total_float_ops = 9;
author	A. Unique TensorFlower <gardener@tensorflow.org>	2017-08-02 21:29:03 -0700
committer	TensorFlower Gardener <gardener@tensorflow.org>	2017-08-02 21:34:37 -0700
commit	19c27ef0d52c20a12800005751d36f96bd948869 (patch)
tree	2b7ae380f2ea8f50b9db4a6967906430a1ac94b6 /tensorflow/core/profiler
parent	565b872d040338d4369885877b8decdfac1faab1 (diff)