[Docs]: Reduce over-estimation while measuring compute time.

Inspired by: https://stackoverflow.com/questions/51717817/performance-measurement-in-tensorflows-eager-mode PiperOrigin-RevId: 207752918
author: Asim Shankar <ashankar@google.com> 2018-08-07 11:42:23 -0700
committer: TensorFlower Gardener <gardener@tensorflow.org> 2018-08-07 12:03:33 -0700
commit: c3d1f4bc30c2cc5e0999ac2b0f04d41d607cb1fe (patch)
tree: a321b14d8d73ceaa9683d6cbcc4609d1704599b6 /tensorflow/docs_src
parent: 01d734d3778c43ada5d56dce87b4f8ba61b5b560 (diff)
1 files changed, 9 insertions, 3 deletions
diff --git a/tensorflow/docs_src/guide/eager.md b/tensorflow/docs_src/guide/eager.md
index 3b54d6d2bb..24f6e4ee95 100644
--- a/tensorflow/docs_src/guide/eager.md
+++ b/tensorflow/docs_src/guide/eager.md
@@ -727,7 +727,13 @@ def measure(x, steps):
   start = time.time()
   for i in range(steps):
     x = tf.matmul(x, x)
-    _ = x.numpy()  # Make sure to execute op and not just enqueue it
+  # tf.matmul can return before completing the matrix multiplication
+  # (e.g., can return after enqueing the operation on a CUDA stream).
+  # The x.numpy() call below will ensure that all enqueued operations
+  # have completed (and will also copy the result to host memory,
+  # so we're including a little more than just the matmul operation
+  # time).
+  _ = x.numpy()
   end = time.time()
   return end - start
 
@@ -751,8 +757,8 @@ Output (exact numbers depend on hardware):
 
 ```
 Time to multiply a (1000, 1000) matrix by itself 200 times:
-CPU: 4.614904403686523 secs
-GPU: 0.5581181049346924 secs
+CPU: 1.46628093719 secs
+GPU: 0.0593810081482 secs
 ```
 
 A `tf.Tensor` object can be copied to a different device to execute its
author	Asim Shankar <ashankar@google.com>	2018-08-07 11:42:23 -0700
committer	TensorFlower Gardener <gardener@tensorflow.org>	2018-08-07 12:03:33 -0700
commit	c3d1f4bc30c2cc5e0999ac2b0f04d41d607cb1fe (patch)
tree	a321b14d8d73ceaa9683d6cbcc4609d1704599b6 /tensorflow/docs_src
parent	01d734d3778c43ada5d56dce87b4f8ba61b5b560 (diff)