Merge commit for internal changes

author: Michael Case <mikecase@google.com> 2018-02-08 00:57:18 -0800
committer: Michael Case <mikecase@google.com> 2018-02-08 00:57:18 -0800
commit: 348bf0c436e4f571022bc08d0699e3b257125467 (patch)
tree: d6dcbde54352ce501c3e9115117e5bec07ce6579 /RELEASE.md
parent: f7f7036d1cdc5716aff976fae0ea4d1b9a931b56 (diff)
parent: 78e4ed153a853536622ff606fc5f6c48a1573ac6 (diff)
1 files changed, 22 insertions, 1 deletions
diff --git a/RELEASE.md b/RELEASE.md
index 0fad3b5d41..0720a8c639 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -96,6 +96,27 @@ Yoni Tsafir, yordun, Yuan (Terry) Tang, Yuxin Wu, zhengdi, Zhengsheng Wei, з”°дј
 * Starting from 1.6 release, our prebuilt binaries will use AVX instructions.
   This may break TF on older CPUs.
 
+## Known Bugs
+* Using XLA:GPU with CUDA 9 and CUDA 9.1 results in garbage results and/or
+  `CUDA_ILLEGAL_ADDRESS` failures.
+
+  Google discovered in mid-December 2017 that the PTX-to-SASS compiler in CUDA 9
+  and CUDA 9.1 sometimes does not properly compute the carry bit when
+  decomposing 64-bit address calculations with large offsets (e.g. `load [x +
+  large_constant]`) into 32-bit arithmetic in SASS.
+
+  As a result, these versions of `ptxas` miscompile most XLA programs which use
+  more than 4GB of temp memory.  This results in garbage results and/or
+  `CUDA_ERROR_ILLEGAL_ADDRESS` failures.
+
+  A fix in CUDA 9.1.121 is expected in late February 2018.  We do not expect a
+  fix for CUDA 9.0.x.  Until the fix is available, the only workaround is to
+  [downgrade](https://developer.nvidia.com/cuda-toolkit-archive) to CUDA 8.0.x
+  or disable XLA:GPU.
+
+  TensorFlow will print a warning if you use XLA:GPU with a known-bad version of
+  CUDA; see e00ba24c4038e7644da417ddc639169b6ea59122.
+
 ## Major Features And Improvements
 * [Eager execution](https://github.com/tensorflow/tensorflow/tree/r1.5/tensorflow/contrib/eager)
   preview version is now available.
@@ -633,7 +654,7 @@ answered questions, and were part of inspiring discussions.
 * Fixed LIBXSMM integration.
 * Make decode_jpeg/decode_png/decode_gif handle all formats, since users frequently try to decode an image as the wrong type.
 * Improve implicit broadcasting lowering.
-* Improving stability of GCS/Bigquery clients by a faster retrying of stale transmissions.
+* Improving stability of GCS/BigQuery clients by a faster retrying of stale transmissions.
 * Remove OpKernelConstruction::op_def() as part of minimizing proto dependencies.
 * VectorLaplaceDiag distribution added.
 * Android demo no longer requires libtensorflow_demo.so to run (libtensorflow_inference.so still required)
author	Michael Case <mikecase@google.com>	2018-02-08 00:57:18 -0800
committer	Michael Case <mikecase@google.com>	2018-02-08 00:57:18 -0800
commit	348bf0c436e4f571022bc08d0699e3b257125467 (patch)
tree	d6dcbde54352ce501c3e9115117e5bec07ce6579 /RELEASE.md
parent	f7f7036d1cdc5716aff976fae0ea4d1b9a931b56 (diff)
parent	78e4ed153a853536622ff606fc5f6c48a1573ac6 (diff)