diff options
author | Michael Case <mikecase@google.com> | 2018-02-08 00:57:18 -0800 |
---|---|---|
committer | Michael Case <mikecase@google.com> | 2018-02-08 00:57:18 -0800 |
commit | 348bf0c436e4f571022bc08d0699e3b257125467 (patch) | |
tree | d6dcbde54352ce501c3e9115117e5bec07ce6579 /RELEASE.md | |
parent | f7f7036d1cdc5716aff976fae0ea4d1b9a931b56 (diff) | |
parent | 78e4ed153a853536622ff606fc5f6c48a1573ac6 (diff) |
Merge commit for internal changes
Diffstat (limited to 'RELEASE.md')
-rw-r--r-- | RELEASE.md | 23 |
1 files changed, 22 insertions, 1 deletions
diff --git a/RELEASE.md b/RELEASE.md index 0fad3b5d41..0720a8c639 100644 --- a/RELEASE.md +++ b/RELEASE.md @@ -96,6 +96,27 @@ Yoni Tsafir, yordun, Yuan (Terry) Tang, Yuxin Wu, zhengdi, Zhengsheng Wei, ç”°ä¼ * Starting from 1.6 release, our prebuilt binaries will use AVX instructions. This may break TF on older CPUs. +## Known Bugs +* Using XLA:GPU with CUDA 9 and CUDA 9.1 results in garbage results and/or + `CUDA_ILLEGAL_ADDRESS` failures. + + Google discovered in mid-December 2017 that the PTX-to-SASS compiler in CUDA 9 + and CUDA 9.1 sometimes does not properly compute the carry bit when + decomposing 64-bit address calculations with large offsets (e.g. `load [x + + large_constant]`) into 32-bit arithmetic in SASS. + + As a result, these versions of `ptxas` miscompile most XLA programs which use + more than 4GB of temp memory. This results in garbage results and/or + `CUDA_ERROR_ILLEGAL_ADDRESS` failures. + + A fix in CUDA 9.1.121 is expected in late February 2018. We do not expect a + fix for CUDA 9.0.x. Until the fix is available, the only workaround is to + [downgrade](https://developer.nvidia.com/cuda-toolkit-archive) to CUDA 8.0.x + or disable XLA:GPU. + + TensorFlow will print a warning if you use XLA:GPU with a known-bad version of + CUDA; see e00ba24c4038e7644da417ddc639169b6ea59122. + ## Major Features And Improvements * [Eager execution](https://github.com/tensorflow/tensorflow/tree/r1.5/tensorflow/contrib/eager) preview version is now available. @@ -633,7 +654,7 @@ answered questions, and were part of inspiring discussions. * Fixed LIBXSMM integration. * Make decode_jpeg/decode_png/decode_gif handle all formats, since users frequently try to decode an image as the wrong type. * Improve implicit broadcasting lowering. -* Improving stability of GCS/Bigquery clients by a faster retrying of stale transmissions. +* Improving stability of GCS/BigQuery clients by a faster retrying of stale transmissions. * Remove OpKernelConstruction::op_def() as part of minimizing proto dependencies. * VectorLaplaceDiag distribution added. * Android demo no longer requires libtensorflow_demo.so to run (libtensorflow_inference.so still required) |