diff options
author | Asim Shankar <ashankar@google.com> | 2018-08-07 11:42:23 -0700 |
---|---|---|
committer | TensorFlower Gardener <gardener@tensorflow.org> | 2018-08-07 12:03:33 -0700 |
commit | c3d1f4bc30c2cc5e0999ac2b0f04d41d607cb1fe (patch) | |
tree | a321b14d8d73ceaa9683d6cbcc4609d1704599b6 /tensorflow/docs_src | |
parent | 01d734d3778c43ada5d56dce87b4f8ba61b5b560 (diff) |
[Docs]: Reduce over-estimation while measuring compute time.
Inspired by:
https://stackoverflow.com/questions/51717817/performance-measurement-in-tensorflows-eager-mode
PiperOrigin-RevId: 207752918
Diffstat (limited to 'tensorflow/docs_src')
-rw-r--r-- | tensorflow/docs_src/guide/eager.md | 12 |
1 files changed, 9 insertions, 3 deletions
diff --git a/tensorflow/docs_src/guide/eager.md b/tensorflow/docs_src/guide/eager.md index 3b54d6d2bb..24f6e4ee95 100644 --- a/tensorflow/docs_src/guide/eager.md +++ b/tensorflow/docs_src/guide/eager.md @@ -727,7 +727,13 @@ def measure(x, steps): start = time.time() for i in range(steps): x = tf.matmul(x, x) - _ = x.numpy() # Make sure to execute op and not just enqueue it + # tf.matmul can return before completing the matrix multiplication + # (e.g., can return after enqueing the operation on a CUDA stream). + # The x.numpy() call below will ensure that all enqueued operations + # have completed (and will also copy the result to host memory, + # so we're including a little more than just the matmul operation + # time). + _ = x.numpy() end = time.time() return end - start @@ -751,8 +757,8 @@ Output (exact numbers depend on hardware): ``` Time to multiply a (1000, 1000) matrix by itself 200 times: -CPU: 4.614904403686523 secs -GPU: 0.5581181049346924 secs +CPU: 1.46628093719 secs +GPU: 0.0593810081482 secs ``` A `tf.Tensor` object can be copied to a different device to execute its |