Merge changes from github.

Change: 114882676
author: Vijay Vasudevan <vrv@google.com> 2016-02-17 11:42:30 -0800
committer: TensorFlower Gardener <gardener@tensorflow.org> 2016-02-17 12:56:41 -0800
commit: fe056f0b5e52db86766761f5e6446a89c1aa3938 (patch)
tree: 68bce0e257d181a3fa37f83c97fdff0fdad877fc
parent: 19d632338f983e02dd0268b931e9cced03b74805 (diff)
153 files changed, 2014 insertions, 1138 deletions
diff --git a/README.md b/README.md
index 09fa9af7bf..74891d0e34 100644
--- a/README.md
+++ b/README.md
@@ -1,11 +1,13 @@
-#TensorFlow
+<div align="center">
+  <img src="https://www.tensorflow.org/images/tf_logo_transp.png"><br><br>
+</div>
+-----------------
 
-Linux CPU [![Build Status](http://ci.tensorflow.org/buildStatus/icon?job=tensorflow-master)](http://ci.tensorflow.org/job/tensorflow-master)
-Linux GPU PIP [![Build Status](http://ci.tensorflow.org/buildStatus/icon?job=tensorflow-master-gpu_pip)](http://ci.tensorflow.org/job/tensorflow-master-gpu_pip)
-Mac OS CPU [![Build Status](http://ci.tensorflow.org/buildStatus/icon?job=tensorflow-master-mac)](http://ci.tensorflow.org/job/tensorflow-master-mac)
-Android [![Build Status](http://ci.tensorflow.org/buildStatus/icon?job=tensorflow-master-android)](http://ci.tensorflow.org/job/tensorflow-master-android)
+|  **`Linux CPU`**   |  **`Linux GPU PIP`** | **`Mac OS CPU`** |  **`Android`** |
+|-------------------|----------------------|------------------|----------------|
+| [![Build Status](http://ci.tensorflow.org/buildStatus/icon?job=tensorflow-master)](http://ci.tensorflow.org/job/tensorflow-master) | [![Build Status](http://ci.tensorflow.org/buildStatus/icon?job=tensorflow-master-gpu_pip)](http://ci.tensorflow.org/job/tensorflow-master-gpu_pip) | [![Build Status](http://ci.tensorflow.org/buildStatus/icon?job=tensorflow-master-mac)](http://ci.tensorflow.org/job/tensorflow-master-mac) | [![Build Status](http://ci.tensorflow.org/buildStatus/icon?job=tensorflow-master-android)](http://ci.tensorflow.org/job/tensorflow-master-android) |
 
-TensorFlow is an open source software library for numerical computation using
+**TensorFlow** is an open source software library for numerical computation using
 data flow graphs.  Nodes in the graph represent mathematical operations, while
 the graph edges represent the multidimensional data arrays (tensors) that flow
 between them.  This flexible architecture lets you deploy computation to one
@@ -24,13 +26,11 @@ tracking requests and bugs, but please see
 [Community](tensorflow/g3doc/resources/index.md#community) for general questions
 and discussion.**
 
-# Download and Setup
+## Installation
+*See [Download and Setup](tensorflow/g3doc/get_started/os_setup.md).*
 
-See [install instructions](tensorflow/g3doc/get_started/os_setup.md).
-
-### Try your first TensorFlow program
-
-```sh
+#### *Try your first TensorFlow program*
+```python
 $ python
 
 >>> import tensorflow as tf
@@ -43,11 +43,9 @@ Hello, TensorFlow!
 >>> sess.run(a+b)
 42
 >>>
-
 ```
 
 ##For more information
-
 * [TensorFlow website](http://tensorflow.org)
 * [TensorFlow whitepaper](http://download.tensorflow.org/paper/whitepaper2015.pdf)
 * [Tensorflow MOOC on Udacity] (https://www.udacity.com/course/deep-learning--ud730)
diff --git a/RELEASE.md b/RELEASE.md
index 3f3d66dc6e..350e36df42 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,6 +1,34 @@
-# Changes since last release
+# Release 0.7.0
 
-## Breaking changes to the API
+## Major Features and Improvements
+
+* Allow using any installed Cuda >= 7.0 and cuDNN >= R2, and add support
+  for cuDNN R4
+* Added a `contrib/` directory for unsupported or experimental features, 
+  including higher level `layers` module
+* Added an easy way to add and dynamically load user-defined ops
+* Built out a good suite of tests, things should break less!
+* Added `MetaGraphDef` which makes it easier to save graphs with metadata
+* Added assignments for "Deep Learning with TensorFlow" udacity course 
+
+
+## Bug Fixes and Other Changes
+
+* Added a versioning framework for `GraphDef`s to ensure compatibility
+* Enforced Python 3 compatibility
+* Internal changes now show up as sensibly separated commits
+* Open-sourced the doc generator
+* Un-fork Eigen
+* Simplified the `BUILD` files and cleaned up C++ headers
+* TensorFlow can now be used as a submodule in another bazel build
+* New ops (e.g., `*fft`, `*_matrix_solve`)
+* Support for more data types in many ops
+* Performance improvements
+* Various bugfixes
+* Documentation fixes and improvements
+
+
+## Breaking Changes to the API
 
 * `AdjustContrast` kernel deprecated, new kernel `AdjustContrastv2` takes and
   outputs float only. `adjust_contrast` now takes all data types.
@@ -33,6 +61,8 @@
   currently maintained for short-term compatibility but will be removed.
 * The non-public `nn.rnn` and the various `nn.seq2seq` methods now return
   just the final state instead of the list of all states.
+* `tf.scatter_update` now no longer guarantees that lexicographically largest
+  index be used for update when duplicate entries exist.
 * `tf.image.random_crop(image, [height, width])` is now
   `tf.random_crop(image, [height, width, depth])`, and `tf.random_crop` works
   for any rank (not just 3-D images).  The C++ `RandomCrop` op has been replaced
@@ -40,15 +70,37 @@
 * Renamed `tf.test.GetTempDir` and `tf.test.IsBuiltWithCuda` to
   `tf.test.get_temp_dir` and `tf.test.is_built_with_cuda` for PEP-8
   compatibility.
-
-
-## Bug fixes
-
+* `parse_example`'s interface has changed, the old interface is accessible in
+  `legacy_parse_example` (same for related functions).
+* New `Variable`s are not added to the same collection several times even if
+  a list with duplicates is passed to the constructor.
 * The Python API will now properly set the `list` member of `AttrValue` in
   constructed `GraphDef` messages for empty lists.  The serialization of some
   graphs will change, but the change is both forwards and backwards compatible.
   It will break tests that compare a generated `GraphDef` to a golden serialized
-  `GraphDef`.
+  `GraphDef` (which is discouraged).
+
+
+## Thanks to our Contributors
+
+This release contains contributions from many people at Google, as well as:
+
+Akiomi Kamakura, Alex Vig, Alexander Rosenberg Johansen, Andre Cruz, Arun Ahuja,
+Bart Coppens, Bernardo Pires, Carl Vondrick, Cesar Salgado, Chen Yu,
+Christian Jauvin, Damien Aymeric, Dan Vanderkam, Denny Britz, Dongjoon Hyun,
+Eren Güven, Erik Erwitt, Fabrizio Milo, G. Hussain Chinoy, Jim Fleming,
+Joao Felipe Santos, Jonas Meinertz Hansen, Joshi Rekha, Julian Viereck,
+Keiji Ariyama, Kenton Lee, Krishna Sankar, Kristina Chodorow, Linchao Zhu,
+Lukas Krecan, Mark Borgerding, Mark Daoust, Moussa Taifi,
+Nathan Howell, Naveen Sundar Govindarajulu, Nick Sweeting, Niklas Riekenbrauck,
+Olivier Grisel, Patrick Christ, Povilas Liubauskas, Rainer Wasserfuhr,
+Romain Thouvenin, Sagan Bolliger, Sam Abrahams, Taehoon Kim, Timothy J Laurent,
+Vlad Zavidovych, Yangqing Jia, Yi-Lin Juang, Yuxin Wu, Zachary Lipton,
+Zero Chen, Alan Wu, @brchiu, @emmjaykay, @jalammar, @Mandar-Shinde,
+@nsipplswezey, @ninotoshi, @panmari, @prolearner and @rizzomichaelg.
+
+We are also grateful to all who filed issues or helped resolve them, asked and 
+answered questions, and were part of inspiring discussions. 
 
 
 # Release 0.6.0
@@ -65,14 +117,14 @@
   come in later releases.
 
 
-## Bug fixes
+## Bug Fixes
 
 * Lots of fixes to documentation and tutorials, many contributed
   by the public.
 
 * 271 closed issues on github issues.
 
-## Backwards-incompatible changes
+## Backwards-Incompatible Changes
 
 * `tf.nn.fixed_unigram_candidate_sampler` changed its default 'distortion'
   attribute from 0.0 to 1.0. This was a bug in the original release
diff --git a/configure b/configure
index f217de4e93..f1c15f35c6 100755
--- a/configure
+++ b/configure
@@ -1,9 +1,5 @@
 #!/bin/bash
 
-if [ "$TF_UNOFFICIAL_SETTING" == "1" ]; then
-  echo -e "\nWARNING: You are configuring unofficial settings in TensorFlow. Because some external libraries are not backward compatible, these settings are largely untested and unsupported. \n" 1>&2
-fi
-
 ## Set up python-related environment settings
 while true; do
   fromuser=""
@@ -49,14 +45,8 @@ fi
 # Find out where the CUDA toolkit is installed
 while true; do
   # Configure the Cuda SDK version to use.
-  default_cuda_version="7.0"
-  if [ "$TF_UNOFFICIAL_SETTING" == "1" ]; then
-    if [ -z "$TF_CUDA_VERSION" ]; then
-      read -p "Please specify the Cuda SDK version you want to use. [Default is $default_cuda_version]: " TF_CUDA_VERSION
-    fi
-  fi
   if [ -z "$TF_CUDA_VERSION" ]; then
-    TF_CUDA_VERSION=$default_cuda_version
+    read -p "Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: " TF_CUDA_VERSION
   fi
 
   fromuser=""
@@ -68,29 +58,28 @@ while true; do
       CUDA_TOOLKIT_PATH=$default_cuda_path
     fi
   fi
-  if [ -e "$CUDA_TOOLKIT_PATH/lib64/libcudart.so.$TF_CUDA_VERSION" ]; then
+  if [[ -z "$TF_CUDA_VERSION" ]]; then
+    TF_CUDA_EXT=""
+  else
+    TF_CUDA_EXT=".$TF_CUDA_VERSION"
+  fi
+  if [ -e $CUDA_TOOLKIT_PATH/lib64/libcudart.so$TF_CUDA_EXT ]; then
     break
   fi
-  echo "Invalid path to CUDA $TF_CUDA_VERSION toolkit. ${CUDA_TOOLKIT_PATH}/lib64/libcudart.so.$TF_CUDA_VERSION cannot be found"
+  echo "Invalid path to CUDA $TF_CUDA_VERSION toolkit. $CUDA_TOOLKIT_PATH/lib64/libcudart.so$TF_CUDA_EXT cannot be found"
   if [ -z "$fromuser" ]; then
     exit 1
   fi
+  # Retry
   TF_CUDA_VERSION=""
   CUDA_TOOLKIT_PATH=""
-  # Retry
 done
 
 # Find out where the cuDNN library is installed
 while true; do
   # Configure the Cudnn version to use.
-  default_cudnn_version="6.5"
-  if [ "$TF_UNOFFICIAL_SETTING" == "1" ]; then
-    if [ -z "$TF_CUDNN_VERSION" ]; then
-      read -p "Please specify the Cudnn version you want to use. [Default is $default_cudnn_version]: " TF_CUDNN_VERSION
-    fi
-  fi
   if [ -z "$TF_CUDNN_VERSION" ]; then
-    TF_CUDNN_VERSION=$default_cudnn_version
+    read -p "Please specify the Cudnn version you want to use. [Leave empty to use system default]: " TF_CUDNN_VERSION
   fi
 
   fromuser=""
@@ -105,23 +94,27 @@ while true; do
     # Going through one more level of expansion to handle that.
     CUDNN_INSTALL_PATH=$(bash -c "readlink -f $CUDNN_INSTALL_PATH")
   fi
-  if [ -e "$CUDNN_INSTALL_PATH/libcudnn.so.${TF_CUDNN_VERSION}" -o -e "$CUDNN_INSTALL_PATH/lib64/libcudnn.so.${TF_CUDNN_VERSION}" ]; then
+  if [[ -z "$TF_CUDNN_VERSION" ]]; then
+    TF_CUDNN_EXT=""
+  else
+    TF_CUDNN_EXT=".$TF_CUDNN_VERSION"
+  fi
+  if [ -e "$CUDNN_INSTALL_PATH/libcudnn.so${CUDNNEXT}" -o -e "$CUDNN_INSTALL_PATH/lib64/libcudnn.so${TF_CUDNN_EXT}" ]; then
     break
   fi
   echo "Invalid path to cuDNN ${TF_CUDNN_VERSION} toolkit. Neither of the following two files can be found:"
-  echo "$CUDNN_INSTALL_PATH/lib64/libcudnn.so.${TF_CUDNN_VERSION}"
-  echo "$CUDNN_INSTALL_PATH/libcudnn.so.${TF_CUDNN_VERSION}"
+  echo "$CUDNN_INSTALL_PATH/lib64/libcudnn.so${TF_CUDNN_EXT}"
+  echo "$CUDNN_INSTALL_PATH/libcudnn.so${TF_CUDNN_EXT}"
   if [ -z "$fromuser" ]; then
     exit 1
   fi
+  # Retry
   TF_CUDNN_VERSION=""
   CUDNN_INSTALL_PATH=""
-  # Retry
 done
 
 cat > third_party/gpus/cuda/cuda.config <<EOF
-# CUDA_TOOLKIT_PATH refers to the CUDA toolkit. Tensorflow requires Cuda $TF_CUDA_VERSION
-# at the moment.
+# CUDA_TOOLKIT_PATH refers to the CUDA toolkit.
 CUDA_TOOLKIT_PATH="$CUDA_TOOLKIT_PATH"
 
 # CUDNN_INSTALL_PATH refers to the cuDNN toolkit. The cuDNN header and library
@@ -129,82 +122,75 @@ CUDA_TOOLKIT_PATH="$CUDA_TOOLKIT_PATH"
 # directories separately.
 CUDNN_INSTALL_PATH="$CUDNN_INSTALL_PATH"
 
-# The Cuda SDK version that should be used in this build
-TF_CUDA_VERSION=$TF_CUDA_VERSION
+# The Cuda SDK version that should be used in this build (empty to use libcudart.so symlink)
+TF_CUDA_VERSION=$TF_CUDA_EXT
 
-# The Cudnn version that should be used in this build
-TF_CUDNN_VERSION=$TF_CUDNN_VERSION
+# The Cudnn version that should be used in this build (empty to use libcudnn.so symlink)
+TF_CUDNN_VERSION=$TF_CUDNN_EXT
 
 EOF
 
-function UnofficialSetting() {
-  # Configure the Cuda toolkit version to work with.
-  perl -pi -e "s,CUDA_VERSION = '[0-9\.]*',CUDA_VERSION = '$TF_CUDA_VERSION',s" tensorflow/core/platform/default/build_config.bzl
-  perl -pi -e "s,(GetCudaVersion.*return )\"[0-9\.]*\",\1\"$TF_CUDA_VERSION\",s" tensorflow/stream_executor/dso_loader.cc
+# Configure the Cuda toolkit version to work with.
+perl -pi -e "s,CUDA_VERSION = '[0-9\.]*',CUDA_VERSION = '$TF_CUDA_EXT',s" tensorflow/core/platform/default/build_config.bzl
+perl -pi -e "s,(GetCudaVersion.*return )\"[0-9\.]*\",\1\"$TF_CUDA_EXT\",s" tensorflow/stream_executor/dso_loader.cc
 
-  # Configure the Cudnn version to work with.
-  perl -pi -e "s,CUDNN_VERSION = '[0-9\.]*',CUDNN_VERSION = '$TF_CUDNN_VERSION',s" tensorflow/core/platform/default/build_config.bzl
-  perl -pi -e "s,(GetCudnnVersion.*return )\"[0-9\.]*\",\1\"$TF_CUDNN_VERSION\",s" tensorflow/stream_executor/dso_loader.cc
+# Configure the Cudnn version to work with.
+perl -pi -e "s,CUDNN_VERSION = '[0-9\.]*',CUDNN_VERSION = '$TF_CUDNN_EXT',s" tensorflow/core/platform/default/build_config.bzl
+perl -pi -e "s,(GetCudnnVersion.*return )\"[0-9\.]*\",\1\"$TF_CUDNN_EXT\",s" tensorflow/stream_executor/dso_loader.cc
 
-  # Configure the compute capabilities that TensorFlow builds for.
-  # Since Cuda toolkit is not backward-compatible, this is not guaranteed to work.
-  while true; do
-    fromuser=""
-    if [ -z "$TF_CUDA_COMPUTE_CAPABILITIES" ]; then
+# Configure the compute capabilities that TensorFlow builds for.
+# Since Cuda toolkit is not backward-compatible, this is not guaranteed to work.
+while true; do
+  fromuser=""
+  if [ -z "$TF_CUDA_COMPUTE_CAPABILITIES" ]; then
 cat << EOF
 Please specify a list of comma-separated Cuda compute capabilities you want to build with.
 You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
 Please note that each additional compute capability significantly increases your build time and binary size.
 EOF
-      read -p "[Default is: \"3.5,5.2\"]: " TF_CUDA_COMPUTE_CAPABILITIES
-      fromuser=1
-    fi
-    # Check whether all capabilities from the input is valid
-    COMPUTE_CAPABILITIES=${TF_CUDA_COMPUTE_CAPABILITIES//,/ }
-    ALL_VALID=1
-    for CAPABILITY in $COMPUTE_CAPABILITIES; do
-      if [[ ! "$CAPABILITY" =~ [0-9]+.[0-9]+ ]]; then
-        echo "Invalid compute capability: " $CAPABILITY
-        ALL_VALID=0
-        break
-      fi
-    done
-    if [ "$ALL_VALID" == "0" ]; then
-      if [ -z "$fromuser" ]; then
-        exit 1
-      fi
-    else
+    read -p "[Default is: \"3.5,5.2\"]: " TF_CUDA_COMPUTE_CAPABILITIES
+    fromuser=1
+  fi
+  # Check whether all capabilities from the input is valid
+  COMPUTE_CAPABILITIES=${TF_CUDA_COMPUTE_CAPABILITIES//,/ }
+  ALL_VALID=1
+  for CAPABILITY in $COMPUTE_CAPABILITIES; do
+    if [[ ! "$CAPABILITY" =~ [0-9]+.[0-9]+ ]]; then
+      echo "Invalid compute capability: " $CAPABILITY
+      ALL_VALID=0
       break
     fi
-    TF_CUDA_COMPUTE_CAPABILITIES=""
   done
-
-  if [ ! -z "$TF_CUDA_COMPUTE_CAPABILITIES" ]; then
-    export WARNING="Unofficial setting. DO NOT"" SUBMIT!!!"
-    function CudaGenCodeOpts() {
-      OUTPUT=""
-      for CAPABILITY in $@; do
-        OUTPUT=${OUTPUT}"   \"${CAPABILITY}\",     "
-      done
-      echo $OUTPUT
-    }
-    export CUDA_GEN_CODES_OPTS=$(CudaGenCodeOpts ${TF_CUDA_COMPUTE_CAPABILITIES//,/ })
-    perl -pi -0 -e 's,\n( *)([^\n]*supported_cuda_compute_capabilities\s*=\s*\[).*?(\]),\n\1# $ENV{WARNING}\n\1\2$ENV{CUDA_GEN_CODES_OPTS}\3,s' third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc
-    function CudaVersionOpts() {
-      OUTPUT=""
-      for CAPABILITY in $@; do
-        OUTPUT=$OUTPUT"CudaVersion(\"${CAPABILITY}\"), "
-      done
-      echo $OUTPUT
-    }
-    export CUDA_VERSION_OPTS=$(CudaVersionOpts ${TF_CUDA_COMPUTE_CAPABILITIES//,/ })
-    perl -pi -0 -e 's,\n( *)([^\n]*supported_cuda_compute_capabilities\s*=\s*\{).*?(\}),\n\1// $ENV{WARNING}\n\1\2$ENV{CUDA_VERSION_OPTS}\3,s' tensorflow/core/common_runtime/gpu/gpu_device.cc
+  if [ "$ALL_VALID" == "0" ]; then
+    if [ -z "$fromuser" ]; then
+      exit 1
+    fi
+  else
+    break
   fi
-}
+  TF_CUDA_COMPUTE_CAPABILITIES=""
+done
 
-# Only run the unofficial settings when users explicitly choose to.
-if [ "$TF_UNOFFICIAL_SETTING" == "1" ]; then
-  UnofficialSetting
+if [ ! -z "$TF_CUDA_COMPUTE_CAPABILITIES" ]; then
+  export WARNING="Unofficial setting. DO NOT"" SUBMIT!!!"
+  function CudaGenCodeOpts() {
+    OUTPUT=""
+    for CAPABILITY in $@; do
+      OUTPUT=${OUTPUT}"   \"${CAPABILITY}\",     "
+    done
+    echo $OUTPUT
+  }
+  export CUDA_GEN_CODES_OPTS=$(CudaGenCodeOpts ${TF_CUDA_COMPUTE_CAPABILITIES//,/ })
+  perl -pi -0 -e 's,\n( *)([^\n]*supported_cuda_compute_capabilities\s*=\s*\[).*?(\]),\n\1# $ENV{WARNING}\n\1\2$ENV{CUDA_GEN_CODES_OPTS}\3,s' third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc
+  function CudaVersionOpts() {
+    OUTPUT=""
+    for CAPABILITY in $@; do
+      OUTPUT=$OUTPUT"CudaVersion(\"${CAPABILITY}\"), "
+    done
+    echo $OUTPUT
+  }
+  export CUDA_VERSION_OPTS=$(CudaVersionOpts ${TF_CUDA_COMPUTE_CAPABILITIES//,/ })
+  perl -pi -0 -e 's,\n( *)([^\n]*supported_cuda_compute_capabilities\s*=\s*\{).*?(\}),\n\1// $ENV{WARNING}\n\1\2$ENV{CUDA_VERSION_OPTS}\3,s' tensorflow/core/common_runtime/gpu/gpu_device.cc
 fi
 
 # Invoke the cuda_config.sh and set up the TensorFlow's canonical view of the Cuda libraries
diff --git a/tensorflow/contrib/README.md b/tensorflow/contrib/README.md
index 61ded454c8..e3c19b7ea2 100644
--- a/tensorflow/contrib/README.md
+++ b/tensorflow/contrib/README.md
@@ -4,8 +4,20 @@ Any code in this directory is not officially supported, and may change or be
 removed at any time without notice.
 
 The contrib directory contains project directories, each of which has designated
-owners.
+owners. It is meant to contain features and contributions that eventually should 
+get merged into core TensorFlow, but whose interfaces may still change, or which
+require some testing to see whether they can find broader acceptance. We are
+trying to keep dupliction within contrib to a minimum, so you may be asked to 
+refactor code in contrib to use some feature inside core or in another project
+in contrib rather than reimplementing the feature.
 
 When adding a project, please stick to the following directory structure:
-Create a project directory in contrib/, and mirror the portions of the
-TensorFlow tree that your project requires underneath contrib/my_project/.
+Create a project directory in `contrib/`, and mirror the portions of the
+TensorFlow tree that your project requires underneath `contrib/my_project/`.
+
+For example, let's say you create foo ops in two files: `foo_ops.py` and 
+`foo_ops_test.py`. If you were to merge those files directly into TensorFlow, 
+they would live in `tensorflow/python/ops/foo_ops.py` and 
+`tensorflow/python/kernel_tests/foo_ops_test.py`. In `contrib/`, they are part 
+of project `foo`, and their full paths are `contrib/foo/python/ops/foo_ops.py`
+and `contrib/foo/python/kernel_tests/foo_ops_test.py`.
diff --git a/tensorflow/contrib/layers/__init__.py b/tensorflow/contrib/layers/__init__.py
index e978c4c8b6..676003325f 100644
--- a/tensorflow/contrib/layers/__init__.py
+++ b/tensorflow/contrib/layers/__init__.py
@@ -12,7 +12,52 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ==============================================================================
-"""layers: A module containing a higher level NN interface."""
+"""Ops for building neural network layers, regularizers, summaries, etc.
+
+## Higher level ops for building neural network layers.
+
+This package provides several ops that take care of creating variables that are
+used internally in a consistent way and provide the building blocks for many
+common machine learning algorithms.
+
+@@convolution2d
+@@fully_connected
+
+Aliases for fully_connected which set a default activation function are
+available: `relu`, `relu6` and `linear`.
+
+## Regularizers
+
+Regularization can help prevent overfitting. These have the signature
+`fn(weights)`. The loss is typically added to `tf.GraphKeys.REGULARIZATION_LOSS`
+
+@@l1_regularizer
+@@l2_regularizer
+
+## Initializers
+
+Initializers are used to initialize variables with sensible values given their
+size, data type, and purpose.
+
+@@xavier_initializer
+@@xavier_initializer_conv2d
+
+## Summaries
+
+Helper functions to summarize specific variables or ops.
+           
+@@summarize_activation
+@@summarize_tensor
+@@summarize_tensors
+@@summarize_collection
+
+The layers module defines convenience functions `summarize_variables`,
+`summarize_weights` and `summarize_biases`, which set the `collection` argument
+of `summarize_collection` to `VARIABLES`, `WEIGHTS` and `BIASES`, respectively.
+
+@@summarize_activations
+
+"""
 
 from __future__ import absolute_import
 from __future__ import division
diff --git a/tensorflow/contrib/layers/python/layers/layers.py b/tensorflow/contrib/layers/python/layers/layers.py
index 3376678f09..5b0cc415a7 100644
--- a/tensorflow/contrib/layers/python/layers/layers.py
+++ b/tensorflow/contrib/layers/python/layers/layers.py
@@ -14,49 +14,7 @@
 # ==============================================================================
 
 # pylint: disable=g-short-docstring-punctuation
-"""## Higher level ops related to regularization and building layers.
-
-This package provides several ops that take care of creating variables that are
-used internally in a consistent way and provide the building blocks for many
-common machine learning algorithms.
-
-@@convolution2d
-@@fully_connected
-
-Aliases for fully_connected which set a default activation function are
-available: `relu`, `relu6` and `linear`.
-
-## Regularizers
-
-Regularization can help prevent overfitting.
-These have the signature `fn(weights)`. The loss is typically added to
-`tf.GraphKeys.REGULARIZATION_LOSS`
-
-@@l1_regularizer
-@@l2_regularizer
-
-## Initializers
-
-Initializers are used to initialize variables with sensible values given their
-size, data type, and purpose.
-
-@@xavier_initializer
-@@xavier_initializer_conv2d
-
-## Summaries
-
-Helper functions to summarize specific variables or ops.
-
-@@summarize_activation
-@@summarize_tensor
-@@summarize_tensors
-@@summarize_collection
-@@summarize_variables
-@@summarize_weights
-@@summarize_biases
-@@summarize_activations
-
-"""
+"""Higher level ops for building layers."""
 
 from __future__ import absolute_import
 from __future__ import division
diff --git a/tensorflow/contrib/layers/python/layers/summaries.py b/tensorflow/contrib/layers/python/layers/summaries.py
index 3fad272043..16b336f6e8 100644
--- a/tensorflow/contrib/layers/python/layers/summaries.py
+++ b/tensorflow/contrib/layers/python/layers/summaries.py
@@ -35,7 +35,7 @@ __all__ = ['summarize_tensor', 'summarize_activation', 'summarize_tensors',
 def _assert_summary_tag_unique(tag):
   for summary in ops.get_collection(ops.GraphKeys.SUMMARIES):
     old_tag = tensor_util.constant_value(summary.op.inputs[0])
-    if tag == str(old_tag):
+    if tag.encode() == old_tag:
       raise ValueError('Conflict with summary tag: %s exists on summary %s %s' %
                        (tag, summary, old_tag))
 
diff --git a/tensorflow/contrib/util/__init__.py b/tensorflow/contrib/util/__init__.py
index 8f6e1900cb..44380320d0 100644
--- a/tensorflow/contrib/util/__init__.py
+++ b/tensorflow/contrib/util/__init__.py
@@ -13,7 +13,14 @@
 # limitations under the License.
 # ==============================================================================
 
-"""contrib module containing volatile or experimental utility code."""
+"""Utilities for dealing with Tensors.
+
+## Miscellaneous Utility Functions
+
+@@constant_value
+@@make_tensor_proto
+
+"""
 
 from __future__ import absolute_import
 from __future__ import division
diff --git a/tensorflow/core/common_runtime/direct_session.cc b/tensorflow/core/common_runtime/direct_session.cc
index 6506b06096..57eb5999b2 100644
--- a/tensorflow/core/common_runtime/direct_session.cc
+++ b/tensorflow/core/common_runtime/direct_session.cc
@@ -535,7 +535,7 @@ Status DirectSession::CheckFetch(const NamedTensorList& feeds,
     pending_feeds.erase(id);
   }
 
-  // Initialize the stack with the fecth nodes.
+  // Initialize the stack with the fetch nodes.
   std::vector<const Node*> stack;
   for (const string& fetch : fetches) {
     TensorId id(ParseTensorName(fetch));
diff --git a/tensorflow/core/common_runtime/function_test.cc b/tensorflow/core/common_runtime/function_test.cc
index 3b61dc8697..53645a2061 100644
--- a/tensorflow/core/common_runtime/function_test.cc
+++ b/tensorflow/core/common_runtime/function_test.cc
@@ -304,7 +304,7 @@ TEST_F(FunctionLibraryRuntimeTest, ExpandInlineFunctions) {
   ExpandInlineFunctions(lib_, g);
   EXPECT_EQ(e2, DebugString(g));
 
-  // Get rid of redunant Identity nodes.
+  // Get rid of redundant Identity nodes.
   RemoveIdentityNodes(g);
   const char* e3 = R"P(
 (n2:float) -> (n42:float) {
@@ -683,7 +683,7 @@ TEST(OptimizationTest, RemoveDeadNodes) {
        {{"a"}, "Square", {"x"}, {{"T", T}}},
        // 1
        FDH::Const("o", 1),
-       // A bunch of extra arithmatic that y doesn't depend on
+       // A bunch of extra arithmetic that y doesn't depend on
        {{"x1"}, "Add", {"o", "o"}, {{"T", T}}},
        {{"x2"}, "Mul", {"a", "x1"}, {{"T", T}}},
        {{"x3"}, "Mul", {"x1", "x2"}, {{"T", T}}},
@@ -722,7 +722,7 @@ TEST(OptimizationTest, RemoveIdentityNodes_Ref) {
       // Nodes
       {// variable
        {{"v"}, "Variable", {}, {{"dtype", T}, {"shape", TensorShape({})}}},
-       // read the variable. Shouln't be removed.
+       // read the variable. Shouldn't be removed.
        {{"v_read"}, "Identity", {"v"}, {{"T", T}}},
        // returns v + v
        {{"ret"}, "Add", {"v_read", "v_read"}, {{"T", T}}}});
@@ -761,7 +761,7 @@ TEST(OptimizationTest, RemoveIdentityNodes) {
        {{"a"}, "Square", {"x"}, {{"T", T}}},
        // 1
        FDH::Const("o", 1),
-       // A bunch of extra arithmatic that y doesn't depend on
+       // A bunch of extra arithmetic that y doesn't depend on
        {{"x1"}, "Identity", {"a"}, {{"T", T}}},
        {{"x2"}, "Identity", {"x1"}, {{"T", T}}},
        {{"x3"}, "Identity", {"x2"}, {{"T", T}}},
diff --git a/tensorflow/core/framework/allocator.h b/tensorflow/core/framework/allocator.h
index b8e4a1f783..f904d5be8d 100644
--- a/tensorflow/core/framework/allocator.h
+++ b/tensorflow/core/framework/allocator.h
@@ -105,7 +105,7 @@ class Allocator {
 
   // Returns true if this allocator tracks the sizes of allocations.
   // RequestedSize and AllocatedSize must be overridden if
-  // TracksAlloctionSizes is overridden to return true.
+  // TracksAllocationSizes is overridden to return true.
   virtual bool TracksAllocationSizes() { return false; }
 
   // Returns true if this allocator requires tensors with 0 elements
diff --git a/tensorflow/core/framework/function.h b/tensorflow/core/framework/function.h
index 503e578fac..8231b91a3f 100644
--- a/tensorflow/core/framework/function.h
+++ b/tensorflow/core/framework/function.h
@@ -160,10 +160,10 @@ inline FunctionDefHelper::AttrValueWrapper::AttrValueWrapper(StringPiece val) {
 // "attr_values", which is a map from a placeholder name to an attr
 // value.
 //
-// InstatiateFunction calls "get_function" to find signatures of other
+// InstantiateFunction calls "get_function" to find signatures of other
 // functions and primitive ops.
 
-// Placeholders in "fdef" is substitued based on "attr_values" here.
+// Placeholders in "fdef" is substituted based on "attr_values" here.
 typedef ::tensorflow::protobuf::Map<string, AttrValue> InstantiateAttrValueMap;
 typedef gtl::ArraySlice<std::pair<string, FunctionDefHelper::AttrValueWrapper>>
     InstantiateAttrValueSlice;
@@ -329,7 +329,7 @@ class FunctionLibraryRuntime {
 //   std::function<Status(const AttrSlice&, FunctionDef*)>.
 //
 // A ::tensorflow::gradient::Creator should populate in FunctionDef* with a
-// definition of a brain function which computate the gradient for the
+// definition of a brain function which compute the gradient for the
 // <op_name> when the <op_name> is instantiated with the given attrs.
 //
 // E.g.,
diff --git a/tensorflow/core/framework/op_kernel.h b/tensorflow/core/framework/op_kernel.h
index a8c2d58b92..dc98044804 100644
--- a/tensorflow/core/framework/op_kernel.h
+++ b/tensorflow/core/framework/op_kernel.h
@@ -473,7 +473,7 @@ class OpKernelContext {
         // will never use eigen_gpu_device. It seems better to have
         // ensure_eigen_gpu_device fall through and regenerate the
         // nullptr every time an OpKernelContext is instantiated, than
-        // to do an unneccessary allocation of a dummy eigen GPU
+        // to do an unnecessary allocation of a dummy eigen GPU
         // device for CPU device Ops.
         eigen_gpu_device = device->MakeGpuDevice();
       }
@@ -1037,7 +1037,7 @@ typedef ::tensorflow::KernelDefBuilder Name;
   static ::tensorflow::kernel_factory::OpKernelRegistrar         \
       registrar__body__##ctr##__object(                          \
           ::tensorflow::register_kernel::kernel_builder.Build(), \
-          +[](::tensorflow::OpKernelConstruction* context)       \
+          [](::tensorflow::OpKernelConstruction* context)        \
               -> ::tensorflow::OpKernel* { return new __VA_ARGS__(context); })
 
 void* GlobalKernelRegistry();
diff --git a/tensorflow/core/graph/costmodel.h b/tensorflow/core/graph/costmodel.h
index f86fc6a141..037e6d4684 100644
--- a/tensorflow/core/graph/costmodel.h
+++ b/tensorflow/core/graph/costmodel.h
@@ -95,7 +95,7 @@ class CostModel {
   // Check that an estimate is available for every OP node in graph.
   void CheckInitialized(const Graph& graph) const;
 
-  // Helper routines to encapsulate static estimatation heuristics
+  // Helper routines to encapsulate static estimation heuristics
 
   // Compute an estimate of the time to copy "b" bytes over the network,
   // given a fixed cost of "network_latency_millis" milliseconds and
diff --git a/tensorflow/core/graph/graph.h b/tensorflow/core/graph/graph.h
index b4b91e115e..4ad2a306b2 100644
--- a/tensorflow/core/graph/graph.h
+++ b/tensorflow/core/graph/graph.h
@@ -15,7 +15,7 @@ limitations under the License.
 
 // A Graph describes a set of computations that are to be
 // performed, as well as the dependencies between those
-// compuations. The basic model is a DAG (directed acyclic graph) with
+// computations. The basic model is a DAG (directed acyclic graph) with
 // * internal nodes representing computational operations to be performed;
 // * edges represent dependencies, indicating the target may only be
 //   executed once the source has completed; and
diff --git a/tensorflow/core/graph/graph_def_builder.h b/tensorflow/core/graph/graph_def_builder.h
index 2a212bbc49..ec28343668 100644
--- a/tensorflow/core/graph/graph_def_builder.h
+++ b/tensorflow/core/graph/graph_def_builder.h
@@ -37,7 +37,7 @@ namespace tensorflow {
 //     node_builder.Input(input);
 //     return opts.FinalizeBuilder(&node_builder);
 //   }
-//   }  // namspace ops
+//   }  // namespace ops
 //
 //   // Or, alternatively:
 //   namespace ops {
@@ -45,7 +45,7 @@ namespace tensorflow {
 //     static const string kOpName = "Identity";
 //     return UnaryOp(kOpName, input, opts);
 //   }
-//   }  // namspace ops
+//   }  // namespace ops
 //
 // You call it like:
 //   GraphDefBuilder b;
diff --git a/tensorflow/core/graph/graph_partition.h b/tensorflow/core/graph/graph_partition.h
index 4ae0133977..5c69af0144 100644
--- a/tensorflow/core/graph/graph_partition.h
+++ b/tensorflow/core/graph/graph_partition.h
@@ -40,7 +40,7 @@ struct PartitionOptions {
 
   // A function that returns the incarnation of a device given the
   // device's fullname. If not found, GetIncarnationFunc should return
-  // kIlledgalIncarnation.
+  // kIllegalIncarnation.
   static const uint64 kIllegalIncarnation = 0;
   typedef std::function<uint64(const string&)> GetIncarnationFunc;
   GetIncarnationFunc get_incarnation = nullptr;
diff --git a/tensorflow/core/kernels/avgpooling_op.h b/tensorflow/core/kernels/avgpooling_op.h
index a275a32c3b..128ab2b3b9 100644
--- a/tensorflow/core/kernels/avgpooling_op.h
+++ b/tensorflow/core/kernels/avgpooling_op.h
@@ -41,7 +41,7 @@ struct SpatialAvgPooling {
 
 typedef Eigen::GpuDevice GPUDevice;
 
-// Lauch a custom GPU kernels from Yanqing for the avgpooling backward operation
+// Launch a custom GPU kernels from Yanqing for the avgpooling backward operation
 // that works NHWC data formats.
 // Arguments:
 //   top_diff: backprop to the output of the pooling layer
diff --git a/tensorflow/core/kernels/constant_op.cc b/tensorflow/core/kernels/constant_op.cc
index c4f8298e27..5f5d7cf8db 100644
--- a/tensorflow/core/kernels/constant_op.cc
+++ b/tensorflow/core/kernels/constant_op.cc
@@ -85,6 +85,7 @@ void HostConstantOp::Compute(OpKernelContext* ctx) {
   ctx->set_output(0, tensor_);
 }
 
+#if GOOGLE_CUDA
 // A special GPU kernel for int32.
 // TODO(b/25387198): Also enable int32 in device memory. This kernel
 // registration requires all int32 inputs and outputs to be in host memory.
@@ -93,6 +94,7 @@ REGISTER_KERNEL_BUILDER(Name("Const")
                             .HostMemory("output")
                             .TypeConstraint<int32>("dtype"),
                         HostConstantOp);
+#endif
 
 typedef Eigen::ThreadPoolDevice CPUDevice;
 typedef Eigen::GpuDevice GPUDevice;
@@ -178,10 +180,6 @@ REGISTER_KERNEL(GPU, int16);
 REGISTER_KERNEL(GPU, int64);
 // Currently we do not support filling strings and complex64 on GPU
 
-#endif  // GOOGLE_CUDA
-
-#undef REGISTER_KERNEL
-
 // A special GPU kernel for int32.
 // TODO(b/25387198): Also enable int32 in device memory. This kernel
 // registration requires all int32 inputs and outputs to be in host memory.
@@ -192,6 +190,9 @@ REGISTER_KERNEL_BUILDER(Name("Fill")
                             .HostMemory("value")
                             .HostMemory("output"),
                         FillOp<CPUDevice, int32>);
+#endif
+
+#undef REGISTER_KERNEL
 
 template <typename Device, typename T>
 class ZerosLikeOp : public OpKernel {
diff --git a/tensorflow/core/kernels/cwise_op_abs.cc b/tensorflow/core/kernels/cwise_op_abs.cc
index ca61a94391..1b976c7210 100644
--- a/tensorflow/core/kernels/cwise_op_abs.cc
+++ b/tensorflow/core/kernels/cwise_op_abs.cc
@@ -23,7 +23,6 @@ REGISTER_KERNEL_BUILDER(Name("ComplexAbs").Device(DEVICE_CPU),
 #endif
 #if GOOGLE_CUDA
 REGISTER3(UnaryOp, GPU, "Abs", functor::abs, float, double, int64);
-#endif
 
 // A special GPU kernel for int32.
 // TODO(b/25387198): Also enable int32 in device memory. This kernel
@@ -34,5 +33,6 @@ REGISTER_KERNEL_BUILDER(Name("Abs")
                             .HostMemory("y")
                             .TypeConstraint<int32>("T"),
                         UnaryOp<CPUDevice, functor::abs<int32>>);
+#endif
 
 }  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_add.cc b/tensorflow/core/kernels/cwise_op_add.cc
index 49d9e0c507..8a9c1979e5 100644
--- a/tensorflow/core/kernels/cwise_op_add.cc
+++ b/tensorflow/core/kernels/cwise_op_add.cc
@@ -20,7 +20,6 @@ REGISTER8(BinaryOp, CPU, "Add", functor::add, float, double, int32, int64, int8,
           int16, complex64, string);
 #if GOOGLE_CUDA
 REGISTER3(BinaryOp, GPU, "Add", functor::add, float, double, int64);
-#endif
 
 // A special GPU kernel for int32.
 // TODO(b/25387198): Also enable int32 in device memory. This kernel
@@ -32,5 +31,6 @@ REGISTER_KERNEL_BUILDER(Name("Add")
                             .HostMemory("z")
                             .TypeConstraint<int32>("T"),
                         BinaryOp<CPUDevice, functor::add<int32>>);
+#endif
 
 }  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_div.cc b/tensorflow/core/kernels/cwise_op_div.cc
index 979db4c50c..e97b6b4360 100644
--- a/tensorflow/core/kernels/cwise_op_div.cc
+++ b/tensorflow/core/kernels/cwise_op_div.cc
@@ -21,7 +21,6 @@ REGISTER7(BinaryOp, CPU, "Div", functor::div, float, double, uint8, int16,
 #if GOOGLE_CUDA
 REGISTER5(BinaryOp, GPU, "Div", functor::div, float, double, uint8, int16,
           int64);
-#endif
 
 // A special GPU kernel for int32.
 // TODO(b/25387198): Also enable int32 in device memory. This kernel
@@ -33,5 +32,6 @@ REGISTER_KERNEL_BUILDER(Name("Div")
                             .HostMemory("z")
                             .TypeConstraint<int32>("T"),
                         BinaryOp<CPUDevice, functor::div<int32>>);
+#endif
 
 }  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_equal_to.cc b/tensorflow/core/kernels/cwise_op_equal_to.cc
index 28801a49d6..1b744445ff 100644
--- a/tensorflow/core/kernels/cwise_op_equal_to.cc
+++ b/tensorflow/core/kernels/cwise_op_equal_to.cc
@@ -21,7 +21,6 @@ REGISTER9(BinaryOp, CPU, "Equal", functor::equal_to, float, double, uint8, int8,
 #if GOOGLE_CUDA
 REGISTER6(BinaryOp, GPU, "Equal", functor::equal_to, float, double, uint8, int8,
           int16, int64);
-#endif
 
 // A special GPU kernel for int32.
 // TODO(b/25387198): Also enable int32 in device memory. This kernel
@@ -33,5 +32,6 @@ REGISTER_KERNEL_BUILDER(Name("Equal")
                             .HostMemory("z")
                             .TypeConstraint<int32>("T"),
                         BinaryOp<CPUDevice, functor::equal_to<int32>>);
+#endif
 
 }  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_greater.cc b/tensorflow/core/kernels/cwise_op_greater.cc
index 7f746745bf..f860ea19ac 100644
--- a/tensorflow/core/kernels/cwise_op_greater.cc
+++ b/tensorflow/core/kernels/cwise_op_greater.cc
@@ -21,7 +21,6 @@ REGISTER7(BinaryOp, CPU, "Greater", functor::greater, float, double, int32,
 #if GOOGLE_CUDA
 REGISTER6(BinaryOp, GPU, "Greater", functor::greater, float, double, int64,
           uint8, int8, int16);
-#endif
 
 // A special GPU kernel for int32.
 // TODO(b/25387198): Also enable int32 in device memory. This kernel
@@ -33,5 +32,6 @@ REGISTER_KERNEL_BUILDER(Name("Greater")
                             .HostMemory("z")
                             .TypeConstraint<int32>("T"),
                         BinaryOp<CPUDevice, functor::greater<int32>>);
+#endif
 
 }  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_greater_equal.cc b/tensorflow/core/kernels/cwise_op_greater_equal.cc
index adf06121a2..465c5a22ae 100644
--- a/tensorflow/core/kernels/cwise_op_greater_equal.cc
+++ b/tensorflow/core/kernels/cwise_op_greater_equal.cc
@@ -21,7 +21,6 @@ REGISTER7(BinaryOp, CPU, "GreaterEqual", functor::greater_equal, float, double,
 #if GOOGLE_CUDA
 REGISTER6(BinaryOp, GPU, "GreaterEqual", functor::greater_equal, float, double,
           int64, uint8, int8, int16);
-#endif
 
 // A special GPU kernel for int32.
 // TODO(b/25387198): Also enable int32 in device memory. This kernel
@@ -33,5 +32,6 @@ REGISTER_KERNEL_BUILDER(Name("GreaterEqual")
                             .HostMemory("z")
                             .TypeConstraint<int32>("T"),
                         BinaryOp<CPUDevice, functor::greater_equal<int32>>);
+#endif
 
 }  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_less.cc b/tensorflow/core/kernels/cwise_op_less.cc
index 1710304703..e7acfa091f 100644
--- a/tensorflow/core/kernels/cwise_op_less.cc
+++ b/tensorflow/core/kernels/cwise_op_less.cc
@@ -21,7 +21,6 @@ REGISTER7(BinaryOp, CPU, "Less", functor::less, float, double, int32, int64,
 #if GOOGLE_CUDA
 REGISTER6(BinaryOp, GPU, "Less", functor::less, float, double, int64, uint8,
           int8, int16);
-#endif
 
 // A special GPU kernel for int32.
 // TODO(b/25387198): Also enable int32 in device memory. This kernel
@@ -33,5 +32,6 @@ REGISTER_KERNEL_BUILDER(Name("Less")
                             .HostMemory("z")
                             .TypeConstraint<int32>("T"),
                         BinaryOp<CPUDevice, functor::less<int32>>);
+#endif
 
 }  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_less_equal.cc b/tensorflow/core/kernels/cwise_op_less_equal.cc
index 65f79b5799..3175dcae5e 100644
--- a/tensorflow/core/kernels/cwise_op_less_equal.cc
+++ b/tensorflow/core/kernels/cwise_op_less_equal.cc
@@ -21,7 +21,6 @@ REGISTER7(BinaryOp, CPU, "LessEqual", functor::less_equal, float, double, int32,
 #if GOOGLE_CUDA
 REGISTER6(BinaryOp, GPU, "LessEqual", functor::less_equal, float, double, int64,
           uint8, int8, int16);
-#endif
 
 // A special GPU kernel for int32.
 // TODO(b/25387198): Also enable int32 in device memory. This kernel
@@ -33,5 +32,6 @@ REGISTER_KERNEL_BUILDER(Name("LessEqual")
                             .HostMemory("z")
                             .TypeConstraint<int32>("T"),
                         BinaryOp<CPUDevice, functor::less_equal<int32>>);
+#endif
 
 }  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_maximum.cc b/tensorflow/core/kernels/cwise_op_maximum.cc
index 732e6b48f2..a2fc480674 100644
--- a/tensorflow/core/kernels/cwise_op_maximum.cc
+++ b/tensorflow/core/kernels/cwise_op_maximum.cc
@@ -20,7 +20,6 @@ REGISTER4(BinaryOp, CPU, "Maximum", functor::maximum, float, double, int32,
           int64);
 #if GOOGLE_CUDA
 REGISTER3(BinaryOp, GPU, "Maximum", functor::maximum, float, double, int64);
-#endif
 
 // A special GPU kernel for int32.
 // TODO(b/25387198): Also enable int32 in device memory. This kernel
@@ -32,5 +31,6 @@ REGISTER_KERNEL_BUILDER(Name("Maximum")
                             .HostMemory("z")
                             .TypeConstraint<int32>("T"),
                         BinaryOp<CPUDevice, functor::maximum<int32>>);
+#endif
 
 }  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_minimum.cc b/tensorflow/core/kernels/cwise_op_minimum.cc
index 017daca4cc..0c6797ec86 100644
--- a/tensorflow/core/kernels/cwise_op_minimum.cc
+++ b/tensorflow/core/kernels/cwise_op_minimum.cc
@@ -20,7 +20,6 @@ REGISTER4(BinaryOp, CPU, "Minimum", functor::minimum, float, double, int32,
           int64);
 #if GOOGLE_CUDA
 REGISTER3(BinaryOp, GPU, "Minimum", functor::minimum, float, double, int64);
-#endif
 
 // A special GPU kernel for int32.
 // TODO(b/25387198): Also enable int32 in device memory. This kernel
@@ -32,5 +31,6 @@ REGISTER_KERNEL_BUILDER(Name("Minimum")
                             .HostMemory("z")
                             .TypeConstraint<int32>("T"),
                         BinaryOp<CPUDevice, functor::minimum<int32>>);
+#endif
 
 }  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_sign.cc b/tensorflow/core/kernels/cwise_op_sign.cc
index 94f7ddd3b2..5b5ad90207 100644
--- a/tensorflow/core/kernels/cwise_op_sign.cc
+++ b/tensorflow/core/kernels/cwise_op_sign.cc
@@ -19,7 +19,6 @@ namespace tensorflow {
 REGISTER4(UnaryOp, CPU, "Sign", functor::sign, float, double, int32, int64);
 #if GOOGLE_CUDA
 REGISTER3(UnaryOp, GPU, "Sign", functor::sign, float, double, int64);
-#endif
 
 // A special GPU kernel for int32.
 // TODO(b/25387198): Also enable int32 in device memory. This kernel
@@ -30,5 +29,6 @@ REGISTER_KERNEL_BUILDER(Name("Sign")
                             .HostMemory("y")
                             .TypeConstraint<int32>("T"),
                         UnaryOp<CPUDevice, functor::sign<int32>>);
+#endif
 
 }  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_sub.cc b/tensorflow/core/kernels/cwise_op_sub.cc
index 8858db9e5f..b3727ec361 100644
--- a/tensorflow/core/kernels/cwise_op_sub.cc
+++ b/tensorflow/core/kernels/cwise_op_sub.cc
@@ -20,7 +20,6 @@ REGISTER5(BinaryOp, CPU, "Sub", functor::sub, float, double, int32, int64,
           complex64);
 #if GOOGLE_CUDA
 REGISTER3(BinaryOp, GPU, "Sub", functor::sub, float, double, int64);
-#endif
 
 // A special GPU kernel for int32.
 // TODO(b/25387198): Also enable int32 in device memory. This kernel
@@ -32,5 +31,6 @@ REGISTER_KERNEL_BUILDER(Name("Sub")
                             .HostMemory("z")
                             .TypeConstraint<int32>("T"),
                         BinaryOp<CPUDevice, functor::sub<int32>>);
+#endif
 
 }  // namespace tensorflow
diff --git a/tensorflow/core/kernels/identity_op.cc b/tensorflow/core/kernels/identity_op.cc
index 22a26e8310..38a46583bd 100644
--- a/tensorflow/core/kernels/identity_op.cc
+++ b/tensorflow/core/kernels/identity_op.cc
@@ -47,6 +47,7 @@ REGISTER_GPU_KERNEL(bfloat16);
 
 #undef REGISTER_GPU_KERNEL
 
+#if GOOGLE_CUDA
 // A special GPU kernel for int32.
 // TODO(b/25387198): Also enable int32 in device memory. This kernel
 // registration requires all int32 inputs and outputs to be in host memory.
@@ -56,5 +57,6 @@ REGISTER_KERNEL_BUILDER(Name("Identity")
                             .HostMemory("output")
                             .TypeConstraint<int32>("T"),
                         IdentityOp);
+#endif
 
 }  // namespace tensorflow
diff --git a/tensorflow/core/kernels/pad_op.cc b/tensorflow/core/kernels/pad_op.cc
index 286b74ca64..f3ad98ab0a 100644
--- a/tensorflow/core/kernels/pad_op.cc
+++ b/tensorflow/core/kernels/pad_op.cc
@@ -170,7 +170,6 @@ TF_CALL_GPU_NUMBER_TYPES(DECLARE_GPU_SPECS);
                           PadOp<GPUDevice, T>)
 
 TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_KERNEL);
-#endif  // GOOGLE_CUDA
 
 // A special GPU kernel for int32.
 // TODO(b/25387198): Also enable int32 in device memory. This kernel
@@ -182,5 +181,6 @@ REGISTER_KERNEL_BUILDER(Name("Pad")
                             .HostMemory("paddings")
                             .HostMemory("output"),
                         PadOp<CPUDevice, int32>);
+#endif
 
 }  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/range_sampler.cc b/tensorflow/core/kernels/range_sampler.cc
index 58bd103c80..40be4bdb20 100644
--- a/tensorflow/core/kernels/range_sampler.cc
+++ b/tensorflow/core/kernels/range_sampler.cc
@@ -60,7 +60,7 @@ namespace {
 // We use batch_size and num_tries, where num_tries is the observed number of
 // tries it took to get batch_size unique values.
 //
-// Assuming (falsely) that the nubmer of tries to get a batch of batch_size
+// Assuming (falsely) that the number of tries to get a batch of batch_size
 // distinct values is _always_ num_tries, the probability that the value
 // is in a batch is (1 - (1-p)^num_tries)
 static float ExpectedCountHelper(float p, int batch_size, int num_tries) {
diff --git a/tensorflow/core/kernels/range_sampler.h b/tensorflow/core/kernels/range_sampler.h
index b2a1d44da1..2513f50b37 100644
--- a/tensorflow/core/kernels/range_sampler.h
+++ b/tensorflow/core/kernels/range_sampler.h
@@ -65,7 +65,7 @@ class RangeSampler {
   // Expected counts for the elements of the returned "batch" are reported
   // in the aligned array "batch_expected_count".
   //
-  // The user can optionally provide "extras", containg values in the range.
+  // The user can optionally provide "extras", containing values in the range.
   // The expected counts for the extras are reported in the aligned array
   // "extras_expected_count".
   //
diff --git a/tensorflow/core/kernels/reshape_op.cc b/tensorflow/core/kernels/reshape_op.cc
index 2413623684..1ee959c026 100644
--- a/tensorflow/core/kernels/reshape_op.cc
+++ b/tensorflow/core/kernels/reshape_op.cc
@@ -30,6 +30,7 @@ REGISTER_KERNEL_BUILDER(Name("Reshape").Device(DEVICE_CPU).HostMemory("shape"),
 TF_CALL_NUMBER_TYPES_NO_INT32(REGISTER_GPU_KERNEL);
 #undef REGISTER_GPU_KERNEL
 
+#if GOOGLE_CUDA
 // A special GPU kernel for int32.
 // TODO(b/25387198): Also enable int32 in device memory. This kernel
 // registration requires all int32 inputs and outputs to be in host memory.
@@ -40,5 +41,6 @@ REGISTER_KERNEL_BUILDER(Name("Reshape")
                             .HostMemory("output")
                             .TypeConstraint<int32>("T"),
                         ReshapeOp);
+#endif
 
 }  // namespace tensorflow
diff --git a/tensorflow/core/kernels/scatter_op.cc b/tensorflow/core/kernels/scatter_op.cc
index afc03efbca..30fd105b5f 100644
--- a/tensorflow/core/kernels/scatter_op.cc
+++ b/tensorflow/core/kernels/scatter_op.cc
@@ -15,6 +15,8 @@ limitations under the License.
 
 // See docs in ../ops/state_ops.cc.
 
+#include "tensorflow/core/kernels/scatter_op.h"
+
 #include "tensorflow/core/framework/op_kernel.h"
 #include "tensorflow/core/framework/register_types.h"
 #include "tensorflow/core/framework/tensor.h"
@@ -23,33 +25,38 @@ limitations under the License.
 
 namespace tensorflow {
 
-enum class UpdateOp { ASSIGN, ADD, SUB };
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+namespace {
 
-template <UpdateOp Op>
+template <scatter_op::UpdateOp Op>
 struct Assign {};
 template <>
-struct Assign<UpdateOp::ASSIGN> {
+struct Assign<scatter_op::UpdateOp::ASSIGN> {
   template <typename Params, typename Update>
   static void Run(Params p, Update u) {
     p = u;
   }
 };
 template <>
-struct Assign<UpdateOp::ADD> {
+struct Assign<scatter_op::UpdateOp::ADD> {
   template <typename Params, typename Update>
   static void Run(Params p, Update u) {
     p += u;
   }
 };
 template <>
-struct Assign<UpdateOp::SUB> {
+struct Assign<scatter_op::UpdateOp::SUB> {
   template <typename Params, typename Update>
   static void Run(Params p, Update u) {
     p -= u;
   }
 };
 
-template <class T, typename Index, UpdateOp op>
+}  // namespace
+
+template <typename Device, typename T, typename Index, scatter_op::UpdateOp op>
 class ScatterUpdateOp : public OpKernel {
  public:
   //   QUESTION: It'd be nice to support DT_INT16, DT_UINT8,
@@ -108,85 +115,136 @@ class ScatterUpdateOp : public OpKernel {
             "updates.shape ", Tupdates.shape().DebugString(),
             ", indices.shape ", Tindices.shape().DebugString(),
             ", params.shape ", Tparams.shape().DebugString()));
-    const Index N = Tindices.NumElements();
 
     // We always return the input ref.
     c->forward_ref_input_to_ref_output(0, 0);
 
+    const Index N = Tindices.NumElements();
     if (N > 0) {
-      const Index first_dim_size = Tparams.dim_size(0);
-      // Validate all the indices are in range
-      auto Tindices_vec = Tindices.flat<Index>();
-      for (Index i = 0; i < N; i++) {
-        const Index index = Tindices_vec(i);
-        OP_REQUIRES(c, index >= 0 && index < first_dim_size,
-                    errors::InvalidArgument(
-                        strings::StrCat("Index ", index, " at offset ", i,
-                                        " in indices is out of range")));
-      }
+      auto Tindices_flat = Tindices.flat<Index>();
       auto Tparams_flat = Tparams.flat_outer_dims<T>();
       auto Tupdates_flat =
           Tupdates.shaped<T, 2>({N, Tupdates.NumElements() / N});
-      for (Index i = 0; i < N; i++) {
-        // Copy last Ndim-1 dimensions of Tupdates[i] to
-        // Tparams[Tindices[i]]
-        Assign<op>::Run(Tparams_flat.template chip<0>(Tindices_vec(i)),
-                        Tupdates_flat.template chip<0>(i));
-      }
+
+      functor::ScatterFunctor<Device, T, Index, op> functor;
+      functor(c, c->template eigen_device<Device>(),
+              Tparams_flat, Tupdates_flat, Tindices_flat);
     }
   }
 };
 
-#define REGISTER_SCATTER_UPDATE(type, index_type)  \
-  REGISTER_KERNEL_BUILDER(                         \
-      Name("ScatterUpdate")                        \
-          .Device(DEVICE_CPU)                      \
-          .TypeConstraint<type>("T")               \
-          .TypeConstraint<index_type>("Tindices"), \
-      ScatterUpdateOp<type, index_type, UpdateOp::ASSIGN>);
+namespace functor {
+// Implementation of update functor for CPU.
+template <typename T, typename Index, scatter_op::UpdateOp op>
+struct ScatterFunctor<CPUDevice, T, Index, op> {
+  void operator()(OpKernelContext* c, const CPUDevice& d,
+                  typename TTypes<T>::Matrix params,
+                  typename TTypes<T>::ConstMatrix updates,
+                  typename TTypes<Index>::ConstFlat indices) {
+    Index N = indices.size();
+    // Validate all the indices are in range
+    Index first_dim_size = params.dimension(0);
+    for (Index i = 0; i < N; i++) {
+      const Index index = indices(i);
+      OP_REQUIRES(c, index >= 0 && index < first_dim_size,
+                  errors::InvalidArgument(
+                      strings::StrCat("Index ", index, " at offset ", i,
+                                      " in indices is out of range")));
+    }
+    for (Index i = 0; i < N; i++) {
+      // Copy last Ndim-1 dimensions of Tupdates[i] to
+      // Tparams[Tindices[i]]
+      Assign<op>::Run(params.template chip<0>(indices(i)),
+                      updates.template chip<0>(i));
+    }
+  }
+};
+}  // namespace functor
 
-#define REGISTER_SCATTER_UPDATE_INT32(type) REGISTER_SCATTER_UPDATE(type, int32)
-#define REGISTER_SCATTER_UPDATE_INT64(type) REGISTER_SCATTER_UPDATE(type, int64)
+#define REGISTER_SCATTER_KERNEL_INDEX(type, index_type, dev, name, op)  \
+  REGISTER_KERNEL_BUILDER(                                              \
+      Name(name)                                                        \
+      .Device(DEVICE_##dev)                                             \
+      .TypeConstraint<type>("T")                                        \
+      .TypeConstraint<index_type>("Tindices"),                          \
+      ScatterUpdateOp<dev##Device, type, index_type, op>)
 
-TF_CALL_ALL_TYPES(REGISTER_SCATTER_UPDATE_INT32);
-TF_CALL_ALL_TYPES(REGISTER_SCATTER_UPDATE_INT64);
+#define REGISTER_SCATTER_KERNEL(type, dev, name, op)            \
+  REGISTER_SCATTER_KERNEL_INDEX(type, int32, dev, name, op);    \
+  REGISTER_SCATTER_KERNEL_INDEX(type, int64, dev, name, op);
 
-#undef REGISTER_SCATTER_UPDATE_INT64
-#undef REGISTER_SCATTER_UPDATE_INT32
-#undef REGISTER_SCATTER_UPDATE
+#define REGISTER_SCATTER_ADD_SUB(type, dev)                 \
+  REGISTER_SCATTER_KERNEL(                                  \
+      type, dev, "ScatterAdd", scatter_op::UpdateOp::ADD);  \
+  REGISTER_SCATTER_KERNEL(                                  \
+      type, dev, "ScatterSub", scatter_op::UpdateOp::SUB);
 
-#define REGISTER_SCATTER_ADD(type, index_type)                         \
-  REGISTER_KERNEL_BUILDER(Name("ScatterAdd")                           \
-                              .Device(DEVICE_CPU)                      \
-                              .TypeConstraint<type>("T")               \
-                              .TypeConstraint<index_type>("Tindices"), \
-                          ScatterUpdateOp<type, index_type, UpdateOp::ADD>);
+#define REGISTER_SCATTER_UPDATE(type, dev)                  \
+  REGISTER_SCATTER_KERNEL(                                  \
+      type, dev, "ScatterUpdate", scatter_op::UpdateOp::ASSIGN);
 
-#define REGISTER_SCATTER_ADD_INT32(type) REGISTER_SCATTER_ADD(type, int32)
-#define REGISTER_SCATTER_ADD_INT64(type) REGISTER_SCATTER_ADD(type, int64)
+// Registers CPU kernels.
+#define REGISTER_SCATTER_ADD_SUB_CPU(type)      \
+  REGISTER_SCATTER_ADD_SUB(type, CPU);
 
-TF_CALL_NUMBER_TYPES(REGISTER_SCATTER_ADD_INT32);
-TF_CALL_NUMBER_TYPES(REGISTER_SCATTER_ADD_INT64);
+#define REGISTER_SCATTER_UPDATE_CPU(type)       \
+  REGISTER_SCATTER_UPDATE(type, CPU);
 
-#undef REGISTER_SCATTER_ADD_INT32
-#undef REGISTER_SCATTER_ADD_INT64
-#undef REGISTER_SCATTER_ADD
+TF_CALL_NUMBER_TYPES(REGISTER_SCATTER_ADD_SUB_CPU);
+TF_CALL_ALL_TYPES(REGISTER_SCATTER_UPDATE_CPU);
+
+// Registers GPU kernels.
+#if GOOGLE_CUDA
+#define REGISTER_SCATTER_ADD_SUB_GPU(type)      \
+  REGISTER_SCATTER_ADD_SUB(type, GPU);
 
-#define REGISTER_SCATTER_SUB(type, index_type)                         \
-  REGISTER_KERNEL_BUILDER(Name("ScatterSub")                           \
-                              .Device(DEVICE_CPU)                      \
-                              .TypeConstraint<type>("T")               \
-                              .TypeConstraint<index_type>("Tindices"), \
-                          ScatterUpdateOp<type, index_type, UpdateOp::SUB>);
+#define REGISTER_SCATTER_UPDATE_GPU(type)       \
+  REGISTER_SCATTER_UPDATE(type, GPU);
 
-#define REGISTER_SCATTER_SUB_INT32(type) REGISTER_SCATTER_SUB(type, int32)
-#define REGISTER_SCATTER_SUB_INT64(type) REGISTER_SCATTER_SUB(type, int64)
+TF_CALL_GPU_NUMBER_TYPES(REGISTER_SCATTER_ADD_SUB_GPU);
+TF_CALL_GPU_NUMBER_TYPES(REGISTER_SCATTER_UPDATE_GPU);
 
-TF_CALL_NUMBER_TYPES(REGISTER_SCATTER_SUB_INT32);
-TF_CALL_NUMBER_TYPES(REGISTER_SCATTER_SUB_INT64);
+#endif  // GOOGLE_CUDA
 
-#undef REGISTER_SCATTER_SUB_INT64
-#undef REGISTER_SCATTER_SUB_INT32
-#undef REGISTER_SCATTER_SUB
+#undef REGISTER_SCATTER_ADD
+#undef REGISTER_SCATTER_ADD_SUB
+#undef REGISTER_SCATTER_ADD_SUB_CPU
+#undef REGISTER_SCATTER_ADD_SUB_GPU
+#undef REGISTER_SCATTER_UPDATE
+#undef REGISTER_SCATTER_UPDATE_CPU
+#undef REGISTER_SCATTER_UPDATE_GPU
+#undef REGISTER_SCATTER_KERNEL
+#undef REGISTER_SCATTER_KERNEL_INDEX
+
+#if GOOGLE_CUDA
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+
+#define DECLARE_GPU_SPECS_OP(T, Index, op)                              \
+  template <>                                                           \
+  void ScatterFunctor<GPUDevice, T, Index, op>::operator()(             \
+      OpKernelContext* c, const GPUDevice& d,                           \
+      typename TTypes<T>::Matrix params,                                \
+      typename TTypes<T>::ConstMatrix updates,                          \
+      typename TTypes<Index>::ConstFlat indices);                       \
+  extern template struct ScatterFunctor<GPUDevice, T, Index, op>;
+
+#define DECLARE_GPU_SPECS_INDEX(T, Index)                       \
+  DECLARE_GPU_SPECS_OP(T, Index, scatter_op::UpdateOp::ASSIGN); \
+  DECLARE_GPU_SPECS_OP(T, Index, scatter_op::UpdateOp::ADD);    \
+  DECLARE_GPU_SPECS_OP(T, Index, scatter_op::UpdateOp::SUB);
+
+#define DECLARE_GPU_SPECS(T)                    \
+  DECLARE_GPU_SPECS_INDEX(T, int32);            \
+  DECLARE_GPU_SPECS_INDEX(T, int64);
+
+TF_CALL_GPU_NUMBER_TYPES(DECLARE_GPU_SPECS);
+
+#undef DECLARE_GPU_SPECS
+#undef DECLARE_GPU_SPECS_INDEX
+#undef DECLARE_GPU_SPECS_OP
+
+}  // namespace functor
+#endif  // GOOGLE_CUDA
 
 }  // namespace tensorflow
diff --git a/tensorflow/core/kernels/scatter_op.h b/tensorflow/core/kernels/scatter_op.h
new file mode 100644
index 0000000000..b7c7df97a7
--- /dev/null
+++ b/tensorflow/core/kernels/scatter_op.h
@@ -0,0 +1,48 @@
+/* Copyright 2015 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+#ifndef TENSORFLOW_KERNELS_SCATTER_OP_H_
+#define TENSORFLOW_KERNELS_SCATTER_OP_H_
+
+// Functor definitions for Scatter ops, must be compilable by nvcc.
+
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+#include "tensorflow/core/framework/tensor_types.h"
+
+namespace tensorflow {
+
+class OpKernelContext;
+
+namespace scatter_op {
+
+enum class UpdateOp { ASSIGN, ADD, SUB };
+
+}  // namespace scatter_op
+
+namespace functor {
+
+// Functor used by ScatterOp to do the computations.
+template <typename Device, typename T, typename Index, scatter_op::UpdateOp op>
+struct ScatterFunctor {
+  void operator()(OpKernelContext* c, const Device& d,
+                  typename TTypes<T>::Matrix params,
+                  typename TTypes<T>::ConstMatrix updates,
+                  typename TTypes<Index>::ConstFlat indices);
+};
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_SCATTER_OP_H_
diff --git a/tensorflow/core/kernels/scatter_op_gpu.cu.cc b/tensorflow/core/kernels/scatter_op_gpu.cu.cc
new file mode 100644
index 0000000000..6ef23419ab
--- /dev/null
+++ b/tensorflow/core/kernels/scatter_op_gpu.cu.cc
@@ -0,0 +1,108 @@
+/* Copyright 2015 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/kernels/scatter_op.h"
+
+#include "tensorflow/core/framework/tensor_types.h"
+#include "tensorflow/core/platform/types.h"
+#include "tensorflow/core/util/cuda_kernel_helper.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename T, typename Index, scatter_op::UpdateOp op>
+__global__ void ScatterOpCustomKernel(
+    T* params, const T* updates, const Index* indices,
+    Index first_dim_size, Index updates_size, Index indices_size) {
+  Index update_block = updates_size / indices_size;
+  CUDA_1D_KERNEL_LOOP(i, updates_size) {
+    int indices_i = i / update_block;
+    int updates_i = i;
+    int param_first_index = indices[indices_i];
+    if (!(param_first_index >= 0 && param_first_index < first_dim_size)) {
+      // Ignore indices that are out of range.
+      continue;
+    }
+    int params_i = param_first_index * update_block + (i % update_block);
+    switch (op) {
+      case scatter_op::UpdateOp::ASSIGN: {
+        params[params_i] = ldg(updates + updates_i);
+        break;
+      }
+      case scatter_op::UpdateOp::ADD: {
+        CudaAtomicAdd(params + params_i, ldg(updates + updates_i));
+        break;
+      }
+      case scatter_op::UpdateOp::SUB: {
+        CudaAtomicSub(params + params_i, ldg(updates + updates_i));
+        break;
+      }
+    }
+  }
+}
+
+namespace functor {
+// Specialization for a GPU device.
+template <typename T, typename Index, scatter_op::UpdateOp op>
+struct ScatterFunctor<GPUDevice, T, Index, op> {
+  void operator()(OpKernelContext* c, const GPUDevice& d,
+                  typename TTypes<T>::Matrix params,
+                  typename TTypes<T>::ConstMatrix updates,
+                  typename TTypes<Index>::ConstFlat indices) {
+    // TODO: Implement indices range check.  The hardest part is with returning
+    // a value after the range check, as we do not want to do device to host
+    // memcpy during a stream.
+    const Index first_dim_size = params.dimension(0);
+    const Index indices_size = indices.size();
+    const Index updates_size = updates.size();
+    CudaLaunchConfig config = GetCudaLaunchConfig(updates_size, d);
+    ScatterOpCustomKernel<T,Index,op>
+        <<<config.block_count, config.thread_per_block, 0, d.stream()>>>(
+            params.data(), updates.data(), indices.data(),
+            first_dim_size, updates_size, indices_size);
+  }
+};
+
+}  // namespace functor
+
+#define DEFINE_GPU_SPECS_OP(T, Index, op)                               \
+  template struct functor::ScatterFunctor<GPUDevice, T, Index, op>;
+
+#define DEFINE_GPU_SPECS_INDEX(T, Index)                        \
+  DEFINE_GPU_SPECS_OP(T, Index, scatter_op::UpdateOp::ASSIGN);  \
+  DEFINE_GPU_SPECS_OP(T, Index, scatter_op::UpdateOp::ADD);     \
+  DEFINE_GPU_SPECS_OP(T, Index, scatter_op::UpdateOp::SUB);
+
+#define DEFINE_GPU_SPECS(T)                     \
+  DEFINE_GPU_SPECS_INDEX(T, int32);             \
+  DEFINE_GPU_SPECS_INDEX(T, int64);
+
+DEFINE_GPU_SPECS(float);
+DEFINE_GPU_SPECS(double);
+// TODO: The following fails to compile.
+// TF_CALL_GPU_NUMBER_TYPES(DEFINE_GPU_SPECS);
+
+#undef DEFINE_GPU_SPECS
+#undef DEFINE_GPU_SPECS_INDEX
+#undef DEFINE_GPU_SPECS_OP
+
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/scatter_op_test.cc b/tensorflow/core/kernels/scatter_op_test.cc
index cb6abbe75f..1d6285882c 100644
--- a/tensorflow/core/kernels/scatter_op_test.cc
+++ b/tensorflow/core/kernels/scatter_op_test.cc
@@ -243,6 +243,7 @@ static void BM_ScatterHelper(int iters, int embedding_size, const char* op) {
   testing::StopTiming();
   const int kRows = 10000000 / embedding_size;
   std::vector<float> values;
+  values.reserve(kRows);
   for (int i = 0; i < kRows * embedding_size; i++) {
     values.push_back(i);
   }
@@ -270,6 +271,7 @@ static void BM_ScatterHelper(int iters, int embedding_size, const char* op) {
   while (iters-- > 0) {
     Status s = bm.RunOpKernel();
   }
+  testing::StopTiming();
 }
 
 static void BM_ScatterUpdateInt32(int iters, int embedding_size) {
diff --git a/tensorflow/core/kernels/shape_ops.cc b/tensorflow/core/kernels/shape_ops.cc
index c2ba5ac275..b2c8be925a 100644
--- a/tensorflow/core/kernels/shape_ops.cc
+++ b/tensorflow/core/kernels/shape_ops.cc
@@ -43,6 +43,7 @@ class ShapeOp : public OpKernel {
 REGISTER_KERNEL_BUILDER(Name("Shape").Device(DEVICE_CPU).HostMemory("output"),
                         ShapeOp);
 
+#if GOOGLE_CUDA
 #define REGISTER_GPU_KERNEL(type)                         \
   REGISTER_KERNEL_BUILDER(Name("Shape")                   \
                               .Device(DEVICE_GPU)         \
@@ -61,6 +62,7 @@ REGISTER_KERNEL_BUILDER(Name("Shape")
                             .HostMemory("output")
                             .TypeConstraint<int32>("T"),
                         ShapeOp);
+#endif
 
 class ShapeNOp : public OpKernel {
  public:
@@ -82,6 +84,7 @@ class ShapeNOp : public OpKernel {
 REGISTER_KERNEL_BUILDER(Name("ShapeN").Device(DEVICE_CPU).HostMemory("output"),
                         ShapeNOp);
 
+#if GOOGLE_CUDA
 #define REGISTER_GPU_KERNEL(type)                         \
   REGISTER_KERNEL_BUILDER(Name("ShapeN")                  \
                               .Device(DEVICE_GPU)         \
@@ -100,6 +103,7 @@ REGISTER_KERNEL_BUILDER(Name("ShapeN")
                             .HostMemory("output")
                             .TypeConstraint<int32>("T"),
                         ShapeNOp);
+#endif
 
 class RankOp : public OpKernel {
  public:
@@ -118,6 +122,7 @@ class RankOp : public OpKernel {
 REGISTER_KERNEL_BUILDER(Name("Rank").Device(DEVICE_CPU).HostMemory("output"),
                         RankOp);
 
+#if GOOGLE_CUDA
 #define REGISTER_GPU_KERNEL(type)                        \
   REGISTER_KERNEL_BUILDER(Name("Rank")                   \
                               .Device(DEVICE_GPU)        \
@@ -143,6 +148,7 @@ REGISTER_KERNEL_BUILDER(Name("Rank")
                             .HostMemory("input")
                             .HostMemory("output"),
                         RankOp);
+#endif
 
 class SizeOp : public OpKernel {
  public:
@@ -162,6 +168,7 @@ class SizeOp : public OpKernel {
 REGISTER_KERNEL_BUILDER(Name("Size").Device(DEVICE_CPU).HostMemory("output"),
                         SizeOp);
 
+#if GOOGLE_CUDA
 #define REGISTER_GPU_KERNEL(type)                        \
   REGISTER_KERNEL_BUILDER(Name("Size")                   \
                               .Device(DEVICE_GPU)        \
@@ -180,6 +187,7 @@ REGISTER_KERNEL_BUILDER(Name("Size")
                             .HostMemory("input")
                             .HostMemory("output"),
                         SizeOp);
+#endif
 
 class ExpandDimsOp : public OpKernel {
  public:
@@ -225,6 +233,7 @@ class ExpandDimsOp : public OpKernel {
 REGISTER_KERNEL_BUILDER(Name("ExpandDims").Device(DEVICE_CPU).HostMemory("dim"),
                         ExpandDimsOp);
 
+#if GOOGLE_CUDA
 #define REGISTER_GPU_KERNEL(type)                        \
   REGISTER_KERNEL_BUILDER(Name("ExpandDims")             \
                               .Device(DEVICE_GPU)        \
@@ -241,6 +250,7 @@ REGISTER_KERNEL_BUILDER(Name("ExpandDims")
                             .HostMemory("dim")
                             .HostMemory("output"),
                         ExpandDimsOp);
+#endif
 
 class SqueezeOp : public OpKernel {
  public:
@@ -313,6 +323,7 @@ class SqueezeOp : public OpKernel {
 
 REGISTER_KERNEL_BUILDER(Name("Squeeze").Device(DEVICE_CPU), SqueezeOp);
 
+#if GOOGLE_CUDA
 #define REGISTER_GPU_KERNEL(type)                                   \
   REGISTER_KERNEL_BUILDER(                                          \
       Name("Squeeze").Device(DEVICE_GPU).TypeConstraint<type>("T"), \
@@ -329,5 +340,6 @@ REGISTER_KERNEL_BUILDER(Name("Squeeze")
                             .HostMemory("input")
                             .HostMemory("output"),
                         SqueezeOp);
+#endif
 
 }  // namespace tensorflow
diff --git a/tensorflow/core/kernels/sparse_matmul_op.cc b/tensorflow/core/kernels/sparse_matmul_op.cc
index 1aa5ae0ab5..69b53500a1 100644
--- a/tensorflow/core/kernels/sparse_matmul_op.cc
+++ b/tensorflow/core/kernels/sparse_matmul_op.cc
@@ -67,7 +67,7 @@ static const int N = 128;
 // Note that all the data/indices of all the blocks are stored in the same
 // vectors respectively. To identify block boundaries, we store the block
 // offsets using index3_offset/index_offset. If there are n blocks in the slice,
-// index3_offset and index_offset have n entires. The indices for the ith block
+// index3_offset and index_offset have n entries. The indices for the ith block
 // are the values in the following range:
 // [index3[index3_offset[i-1]], index3[index3_offset[i]]). Similarly for
 // index_offset.
@@ -475,7 +475,7 @@ class SparseMatMulOp : public OpKernel {
     if (!a_is_sparse_ && !b_is_sparse_) {
       // Fallback to Eigen contract.
       // Note that we currently don't optimize the case where only right is
-      // sparse. That can generally be handled by tranposing the order of the
+      // sparse. That can generally be handled by transposing the order of the
       // matmul.
       Eigen::array<Eigen::IndexPair<Eigen::DenseIndex>, 1> dim_pair;
       dim_pair[0].first = transpose_a_ ? 0 : 1;
@@ -540,7 +540,7 @@ class SparseMatMulOp : public OpKernel {
   // Encodes "mat" using a sparse representation and stores that in
   // "mat_slices". "mat" is broken into a grid with sizes "slice_num_rows" and
   // "slice_num_cols", each grid element is converted into a SparseSlice and
-  // stored in mat_slices. "slice_block_size" is used to perform futher column
+  // stored in mat_slices. "slice_block_size" is used to perform further column
   // blocking of each slice.
   static inline BlockingCounter* CreateSparseSlices(
       const ConstMatrixMap& mat, bool transpose, int slice_num_rows,
@@ -776,7 +776,7 @@ inline void SparseMatMulOp::ComputeBlockSizes(const ConstMatrixMap& left,
   *KR = std::min(static_cast<int>(right.dimension(0)), mem / 256);
   *NR = right.dimension(1);
   if (*KR * *NR > mem) {
-    // 4096 may be enough to ammortize the cost of writes.
+    // 4096 may be enough to amortize the cost of writes.
     *KR = std::min<int>(*KR, 4096);
   }
   // Use sizes that are multiples of K and 256.
diff --git a/tensorflow/core/lib/core/command_line_flags.cc b/tensorflow/core/lib/core/command_line_flags.cc
index 26e495a520..1cfc01b6e2 100644
--- a/tensorflow/core/lib/core/command_line_flags.cc
+++ b/tensorflow/core/lib/core/command_line_flags.cc
@@ -43,7 +43,7 @@ bool StringToValue<string>(const string& content, string* value) {
 // Return OK if the argument is used. It store the extracted value into the
 // matching flag.
 // Return NOT_FOUND if the argument is not recognized.
-// Retrun INVALID_ARGUMENT if the command is recognized, but fails to extract
+// Return INVALID_ARGUMENT if the command is recognized, but fails to extract
 // its value.
 template <typename T>
 Status ParseArgument(const string& argument) {
diff --git a/tensorflow/core/lib/gtl/inlined_vector.h b/tensorflow/core/lib/gtl/inlined_vector.h
index 6b2cba809c..1f35c0cab8 100644
--- a/tensorflow/core/lib/gtl/inlined_vector.h
+++ b/tensorflow/core/lib/gtl/inlined_vector.h
@@ -374,7 +374,7 @@ class InlinedVector {
 
   // Moves srcs[0,n-1] contents to dst[0,n-1].
   static void Move(const T* src, size_t n, T* dst) {
-    for (int i = 0; i < n; i++) {
+    for (size_t i = 0; i < n; i++) {
       new (dst + i) T(std::move(*(src + i)));
     }
   }
diff --git a/tensorflow/core/lib/jpeg/jpeg_handle.h b/tensorflow/core/lib/jpeg/jpeg_handle.h
index 04b092f053..4beca30c3f 100644
--- a/tensorflow/core/lib/jpeg/jpeg_handle.h
+++ b/tensorflow/core/lib/jpeg/jpeg_handle.h
@@ -14,7 +14,7 @@ limitations under the License.
 ==============================================================================*/
 
 // This file declares the functions and structures for memory I/O with libjpeg
-// These functions are not meant to be used directly, see jpeg_mem.h isntead.
+// These functions are not meant to be used directly, see jpeg_mem.h instead.
 
 #ifndef TENSORFLOW_LIB_JPEG_JPEG_HANDLE_H_
 #define TENSORFLOW_LIB_JPEG_JPEG_HANDLE_H_
diff --git a/tensorflow/core/lib/random/distribution_sampler.cc b/tensorflow/core/lib/random/distribution_sampler.cc
index 1fc1397499..3daaa1447c 100644
--- a/tensorflow/core/lib/random/distribution_sampler.cc
+++ b/tensorflow/core/lib/random/distribution_sampler.cc
@@ -42,7 +42,7 @@ DistributionSampler::DistributionSampler(
   std::vector<int> low;
   low.reserve(n);
 
-  // compute propotional weights
+  // compute proportional weights
   for (int i = 0; i < n; i++) {
     double p = (weights[i] * n) / sum;
     pr[i] = p;
diff --git a/tensorflow/core/lib/random/random_distributions.h b/tensorflow/core/lib/random/random_distributions.h
index 9f510342d7..a2ee5c96aa 100644
--- a/tensorflow/core/lib/random/random_distributions.h
+++ b/tensorflow/core/lib/random/random_distributions.h
@@ -192,7 +192,7 @@ class SingleSampleAdapter {
 //              each invocation. It needs to define kResultElementCount for the
 //              sample count for each invocation, and ResultType for actual
 //              returned sample type.
-//   RealType: the data type of the real numberes that will be returned by the
+//   RealType: the data type of the real numbers that will be returned by the
 //             distribution. This could be either float or double for now.
 // This class is meant to be implemented through specialization. The default
 // is not defined by design.
@@ -259,7 +259,7 @@ class NormalDistribution<Generator, double> {
 //              each invocation. It needs to define kResultElementCount for the
 //              sample count for each invocation, and ResultType for actual
 //              returned sample type.
-//   RealType: the data type of the real numberes that will be returned by the
+//   RealType: the data type of the real numbers that will be returned by the
 //             distribution. This could be either float or double for now.
 // This class is meant to be implemented through specialization. The default
 // is not defined by design.
diff --git a/tensorflow/core/lib/random/random_distributions_test.cc b/tensorflow/core/lib/random/random_distributions_test.cc
index 4d3e4a5cdc..13398b838f 100644
--- a/tensorflow/core/lib/random/random_distributions_test.cc
+++ b/tensorflow/core/lib/random/random_distributions_test.cc
@@ -36,7 +36,7 @@ namespace {
 static constexpr float kZLimit = 6.0;
 
 // A utility function to fill the given array with samples from the given
-// distribution, using the single adatper of the underlying generator
+// distribution, using the single adapter of the underlying generator
 template <class Distribution>
 void FillRandomsWithSingles(PhiloxRandom gen,
                             typename Distribution::ResultElementType* p,
@@ -87,7 +87,7 @@ bool CheckSamplesMoments(const std::vector<T>& samples,
         break;
       }
       // moments[i] store the i-th order measured moments.
-      // bypass std::vector::opeartor[] because they are too slow in the debug
+      // bypass std::vector::operator[] because they are too slow in the debug
       // mode, given the large number of samples.
       moments_data[i] += moment;
       ++moments_sample_count_data[i];
diff --git a/tensorflow/core/lib/strings/numbers.cc b/tensorflow/core/lib/strings/numbers.cc
index 859a654e36..778545b44b 100644
--- a/tensorflow/core/lib/strings/numbers.cc
+++ b/tensorflow/core/lib/strings/numbers.cc
@@ -33,7 +33,7 @@ char* FastInt32ToBufferLeft(int32 i, char* buffer) {
   if (i < 0) {
     *buffer++ = '-';
     // We need to do the negation in modular (i.e., "unsigned")
-    // arithmetic; MSVC++ apprently warns for plain "-u", so
+    // arithmetic; MSVC++ apparently warns for plain "-u", so
     // we write the equivalent expression "0 - u" instead.
     u = 0 - u;
   }
diff --git a/tensorflow/core/lib/strings/numbers.h b/tensorflow/core/lib/strings/numbers.h
index f1924ccf93..4dd0bcdec4 100644
--- a/tensorflow/core/lib/strings/numbers.h
+++ b/tensorflow/core/lib/strings/numbers.h
@@ -77,7 +77,7 @@ char* FloatToBuffer(float i, char* buffer);
 string FpToString(Fprint fp);
 
 // Attempt to parse a fingerprint in the form encoded by FpToString.  If
-// successsful, stores the fingerprint in *fp and returns true.  Otherwise,
+// successful, stores the fingerprint in *fp and returns true.  Otherwise,
 // returns false.
 bool StringToFp(const string& s, Fprint* fp);
 
diff --git a/tensorflow/core/lib/strings/strcat.h b/tensorflow/core/lib/strings/strcat.h
index 02632a6b2b..33b6028153 100644
--- a/tensorflow/core/lib/strings/strcat.h
+++ b/tensorflow/core/lib/strings/strcat.h
@@ -30,7 +30,7 @@ limitations under the License.
 // The AlphaNum type was designed to be used as the parameter type for StrCat().
 // Any routine accepting either a string or a number may accept it.
 // The basic idea is that by accepting a "const AlphaNum &" as an argument
-// to your function, your callers will automagically convert bools, integers,
+// to your function, your callers will automatically convert bools, integers,
 // and floating point values to strings for you.
 //
 // NOTE: Use of AlphaNum outside of the //strings package is unsupported except
diff --git a/tensorflow/core/ops/image_ops.cc b/tensorflow/core/ops/image_ops.cc
index d9c3d8d2c9..99c90a811b 100644
--- a/tensorflow/core/ops/image_ops.cc
+++ b/tensorflow/core/ops/image_ops.cc
@@ -446,22 +446,20 @@ Bounding boxes are supplied and returned as `[y_min, x_min, y_max, x_max]`. The
 bounding box coordinates are floats in `[0.0, 1.0]` relative to the width and
 height of the underlying image.
 
-Example:
-
-```
-# Generate a single distorted bounding box.
-begin, size, bbox_for_draw = tf.image.sample_distorted_bounding_box(
-   tf.shape(image),
-   bounding_boxes=bounding_boxes)
-
-# Draw the bounding box in an image summary.
-image_with_box = tf.image.draw_bounding_boxes(tf.expand_dims(image, 0),
-                                              bbox_for_draw)
-tf.image_summary('images_with_box', image_with_box)
-
-# Employ the bounding box to distort the image.
-distorted_image = tf.slice(image, begin, size)
-```
+For example,
+
+    # Generate a single distorted bounding box.
+    begin, size, bbox_for_draw = tf.image.sample_distorted_bounding_box(
+        tf.shape(image),
+        bounding_boxes=bounding_boxes)
+
+    # Draw the bounding box in an image summary.
+    image_with_box = tf.image.draw_bounding_boxes(tf.expand_dims(image, 0),
+                                                  bbox_for_draw)
+    tf.image_summary('images_with_box', image_with_box)
+
+    # Employ the bounding box to distort the image.
+    distorted_image = tf.slice(image, begin, size)
 
 Note that if no bounding box information is available, setting
 `use_image_if_no_bounding_boxes = true` will assume there is a single implicit
diff --git a/tensorflow/core/ops/ops.pbtxt b/tensorflow/core/ops/ops.pbtxt
index 644bb546b3..1a1e019230 100644
--- a/tensorflow/core/ops/ops.pbtxt
+++ b/tensorflow/core/ops/ops.pbtxt
@@ -7122,22 +7122,27 @@ op {
   name: "SampleDistortedBoundingBox"
   input_arg {
     name: "image_size"
+    description: "1-D, containing `[height, width, channels]`."
     type_attr: "T"
   }
   input_arg {
     name: "bounding_boxes"
+    description: "3-D with shape `[batch, N, 4]` describing the N bounding boxes\nassociated with the image."
     type: DT_FLOAT
   }
   output_arg {
     name: "begin"
+    description: "1-D, containing `[offset_height, offset_width, 0]`. Provide as input to\n`tf.slice`."
     type_attr: "T"
   }
   output_arg {
     name: "size"
+    description: "1-D, containing `[target_height, target_width, -1]`. Provide as input to\n`tf.slice`."
     type_attr: "T"
   }
   output_arg {
     name: "bboxes"
+    description: "3-D with shape `[1, 1, 4]` containing the distorted bounding box.\nProvide as input to `tf.image.draw_bounding_boxes`."
     type: DT_FLOAT
   }
   attr {
@@ -7159,6 +7164,7 @@ op {
     default_value {
       i: 0
     }
+    description: "If either `seed` or `seed2` are set to non-zero, the random number\ngenerator is seeded by the given `seed`.  Otherwise, it is seeded by a random\nseed."
   }
   attr {
     name: "seed2"
@@ -7166,6 +7172,7 @@ op {
     default_value {
       i: 0
     }
+    description: "A second seed to avoid seed collision."
   }
   attr {
     name: "min_object_covered"
@@ -7173,6 +7180,7 @@ op {
     default_value {
       f: 0.1
     }
+    description: "The cropped area of the image must contain at least this\nfraction of any bounding box supplied."
   }
   attr {
     name: "aspect_ratio_range"
@@ -7183,6 +7191,7 @@ op {
         f: 1.33
       }
     }
+    description: "The cropped area of the image must have an aspect ratio =\nwidth / height within this range."
   }
   attr {
     name: "area_range"
@@ -7193,6 +7202,7 @@ op {
         f: 1
       }
     }
+    description: "The cropped area of the image must contain a fraction of the\nsupplied image within in this range."
   }
   attr {
     name: "max_attempts"
@@ -7200,6 +7210,7 @@ op {
     default_value {
       i: 100
     }
+    description: "Number of attempts at generating a cropped region of the image\nof the specified constraints. After `max_attempts` failures, return the entire\nimage."
   }
   attr {
     name: "use_image_if_no_bounding_boxes"
@@ -7207,9 +7218,10 @@ op {
     default_value {
       b: false
     }
+    description: "Controls behavior if no bounding boxes supplied.\nIf true, assume an implicit bounding box covering the whole input. If false,\nraise an error."
   }
   summary: "Generate a single randomly distorted bounding box for an image."
-  description: "Bounding box annotations are often supplied in addition to ground-truth labels\nin image recognition or object localization tasks. A common technique for\ntraining such a system is to randomly distort an image while preserving\nits content, i.e. *data augmentation*. This Op outputs a randomly distorted\nlocalization of an object, i.e. bounding box, given an `image_size`,\n`bounding_boxes` and a series of constraints.\n\nThe output of this Op is a single bounding box that may be used to crop the\noriginal image. The output is returned as 3 tensors: `begin`, `size` and\n`bboxes`. The first 2 tensors can be fed directly into `tf.slice` to crop the\nimage. The latter may be supplied to `tf.image.draw_bounding_box` to visualize\nwhat the bounding box looks like.\n\nBounding boxes are supplied and returned as `[y_min, x_min, y_max, x_max]`. The\nbounding box coordinates are floats in `[0.0, 1.0]` relative to the width and\nheight of the underlying image."
+  description: "Bounding box annotations are often supplied in addition to ground-truth labels\nin image recognition or object localization tasks. A common technique for\ntraining such a system is to randomly distort an image while preserving\nits content, i.e. *data augmentation*. This Op outputs a randomly distorted\nlocalization of an object, i.e. bounding box, given an `image_size`,\n`bounding_boxes` and a series of constraints.\n\nThe output of this Op is a single bounding box that may be used to crop the\noriginal image. The output is returned as 3 tensors: `begin`, `size` and\n`bboxes`. The first 2 tensors can be fed directly into `tf.slice` to crop the\nimage. The latter may be supplied to `tf.image.draw_bounding_box` to visualize\nwhat the bounding box looks like.\n\nBounding boxes are supplied and returned as `[y_min, x_min, y_max, x_max]`. The\nbounding box coordinates are floats in `[0.0, 1.0]` relative to the width and\nheight of the underlying image.\n\nFor example,\n\n    # Generate a single distorted bounding box.\n    begin, size, bbox_for_draw = tf.image.sample_distorted_bounding_box(\n        tf.shape(image),\n        bounding_boxes=bounding_boxes)\n\n    # Draw the bounding box in an image summary.\n    image_with_box = tf.image.draw_bounding_boxes(tf.expand_dims(image, 0),\n                                                  bbox_for_draw)\n    tf.image_summary(\'images_with_box\', image_with_box)\n\n    # Employ the bounding box to distort the image.\n    distorted_image = tf.slice(image, begin, size)\n\nNote that if no bounding box information is available, setting\n`use_image_if_no_bounding_boxes = true` will assume there is a single implicit\nbounding box covering the whole image. If `use_image_if_no_bounding_boxes` is\nfalse and no bounding boxes are supplied, an error is raised."
   is_stateful: true
 }
 op {
@@ -7482,7 +7494,7 @@ op {
     description: "If True, the assignment will be protected by a lock;\notherwise the behavior is undefined, but may exhibit less contention."
   }
   summary: "Applies sparse updates to a variable reference."
-  description: "This operation computes\n\n    # Scalar indices\n    ref[indices, ...] = updates[...]\n\n    # Vector indices (for each i)\n    ref[indices[i], ...] = updates[i, ...]\n\n    # High rank indices (for each i, ..., j)\n    ref[indices[i, ..., j], ...] = updates[i, ..., j, ...]\n\nThis operation outputs `ref` after the update is done.\nThis makes it easier to chain operations that need to use the reset value.\n\nIf `indices` contains duplicate entries, lexicographically later entries\noverride earlier entries.\n\nRequires `updates.shape = indices.shape + ref.shape[1:]`.\n\n<div style=\"width:70%; margin:auto; margin-bottom:10px; margin-top:20px;\">\n<img style=\"width:100%\" src=\"../../images/ScatterUpdate.png\" alt>\n</div>"
+  description: "This operation computes\n\n    # Scalar indices\n    ref[indices, ...] = updates[...]\n\n    # Vector indices (for each i)\n    ref[indices[i], ...] = updates[i, ...]\n\n    # High rank indices (for each i, ..., j)\n    ref[indices[i, ..., j], ...] = updates[i, ..., j, ...]\n\nThis operation outputs `ref` after the update is done.\nThis makes it easier to chain operations that need to use the reset value.\n\nIf values in `ref` is to be updated more than once, because there are\nduplicate entires in `indices`, the order at which the updates happen\nfor each value is undefined.\n\nRequires `updates.shape = indices.shape + ref.shape[1:]`.\n\n<div style=\"width:70%; margin:auto; margin-bottom:10px; margin-top:20px;\">\n<img style=\"width:100%\" src=\"../../images/ScatterUpdate.png\" alt>\n</div>"
 }
 op {
   name: "SegmentMax"
diff --git a/tensorflow/core/ops/state_ops.cc b/tensorflow/core/ops/state_ops.cc
index a5a3d14a07..3bec698acc 100644
--- a/tensorflow/core/ops/state_ops.cc
+++ b/tensorflow/core/ops/state_ops.cc
@@ -182,8 +182,9 @@ This operation computes
 This operation outputs `ref` after the update is done.
 This makes it easier to chain operations that need to use the reset value.
 
-If `indices` contains duplicate entries, lexicographically later entries
-override earlier entries.
+If values in `ref` is to be updated more than once, because there are
+duplicate entires in `indices`, the order at which the updates happen
+for each value is undefined.
 
 Requires `updates.shape = indices.shape + ref.shape[1:]`.
 
diff --git a/tensorflow/core/platform/default/build_config.bzl b/tensorflow/core/platform/default/build_config.bzl
index 9d069ab4c6..a89a0a4af7 100644
--- a/tensorflow/core/platform/default/build_config.bzl
+++ b/tensorflow/core/platform/default/build_config.bzl
@@ -3,9 +3,9 @@
 load("//google/protobuf:protobuf.bzl", "cc_proto_library")
 load("//google/protobuf:protobuf.bzl", "py_proto_library")
 
-# configure may change the following lines.
-CUDA_VERSION = '7.0'
-CUDNN_VERSION = '6.5'
+# configure may change the following lines to '.X.Y' or similar
+CUDA_VERSION = ''
+CUDNN_VERSION = ''
 
 # Appends a suffix to a list of deps.
 def tf_deps(deps, suffix):
diff --git a/tensorflow/core/platform/default/build_config/BUILD b/tensorflow/core/platform/default/build_config/BUILD
index c6dccc06ff..4b8088fde8 100644
--- a/tensorflow/core/platform/default/build_config/BUILD
+++ b/tensorflow/core/platform/default/build_config/BUILD
@@ -75,7 +75,7 @@ filegroup(
 cc_library(
     name = "cuda",
     data = [
-        "//third_party/gpus/cuda:lib64/libcudart.so." + tf_get_cuda_version(),
+        "//third_party/gpus/cuda:lib64/libcudart.so" + tf_get_cuda_version(),
     ],
     linkopts = [
         "-Wl,-rpath,third_party/gpus/cuda/lib64",
diff --git a/tensorflow/core/platform/default/mutex.h b/tensorflow/core/platform/default/mutex.h
index d8ba37babc..18395f3292 100644
--- a/tensorflow/core/platform/default/mutex.h
+++ b/tensorflow/core/platform/default/mutex.h
@@ -22,6 +22,7 @@ limitations under the License.
 #include <chrono>
 #include <condition_variable>
 #include <mutex>
+#include "tensorflow/core/platform/default/thread_annotations.h"
 
 namespace tensorflow {
 
@@ -29,16 +30,24 @@ enum LinkerInitialized { LINKER_INITIALIZED };
 
 // A class that wraps around the std::mutex implementation, only adding an
 // additional LinkerInitialized constructor interface.
-class mutex : public std::mutex {
+class LOCKABLE mutex : public std::mutex {
  public:
   mutex() {}
   // The default implementation of std::mutex is safe to use after the linker
   // initializations
   explicit mutex(LinkerInitialized x) {}
+
+  void lock() ACQUIRE() { std::mutex::lock(); }
+  void unlock() RELEASE() { std::mutex::unlock(); }
+};
+
+class SCOPED_LOCKABLE mutex_lock : public std::unique_lock<std::mutex> {
+ public:
+  mutex_lock(class mutex& m) ACQUIRE(m) : std::unique_lock<std::mutex>(m) {}
+  ~mutex_lock() RELEASE() {}
 };
 
 using std::condition_variable;
-typedef std::unique_lock<std::mutex> mutex_lock;
 
 inline ConditionResult WaitForMilliseconds(mutex_lock* mu,
                                            condition_variable* cv, int64 ms) {
diff --git a/tensorflow/core/platform/default/thread_annotations.h b/tensorflow/core/platform/default/thread_annotations.h
index d8a9253926..46143b2ea3 100644
--- a/tensorflow/core/platform/default/thread_annotations.h
+++ b/tensorflow/core/platform/default/thread_annotations.h
@@ -73,6 +73,15 @@ limitations under the License.
 #define ACQUIRED_BEFORE(...) \
   THREAD_ANNOTATION_ATTRIBUTE__(acquired_before(__VA_ARGS__))
 
+#define ACQUIRE(...) \
+  THREAD_ANNOTATION_ATTRIBUTE__(acquire_capability(__VA_ARGS__))
+
+#define ACQUIRE_SHARED(...) \
+  THREAD_ANNOTATION_ATTRIBUTE__(acquire_shared_capability(__VA_ARGS__))
+
+#define RELEASE(...) \
+  THREAD_ANNOTATION_ATTRIBUTE__(release_capability(__VA_ARGS__))
+
 // Document a function that expects a mutex to be held prior to entry.
 // The mutex is expected to be held both on entry to and exit from the
 // function.
diff --git a/tensorflow/core/public/version.h b/tensorflow/core/public/version.h
index 4c0a9b8d70..9d46087814 100644
--- a/tensorflow/core/public/version.h
+++ b/tensorflow/core/public/version.h
@@ -19,7 +19,7 @@ limitations under the License.
 // TensorFlow uses semantic versioning, see http://semver.org/.
 
 #define TF_MAJOR_VERSION 0
-#define TF_MINOR_VERSION 6
+#define TF_MINOR_VERSION 7
 #define TF_PATCH_VERSION 0
 
 // TF_VERSION_SUFFIX is non-empty for pre-releases (e.g. "-alpha", "-alpha.1",
diff --git a/tensorflow/core/util/cuda_kernel_helper.h b/tensorflow/core/util/cuda_kernel_helper.h
index 4e1dfb8c4f..c98207b6f0 100644
--- a/tensorflow/core/util/cuda_kernel_helper.h
+++ b/tensorflow/core/util/cuda_kernel_helper.h
@@ -20,6 +20,8 @@ limitations under the License.
 
 #include <algorithm>
 
+#include "tensorflow/core/platform/types.h"
+
 #define CUDA_1D_KERNEL_LOOP(i, n)                            \
   for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \
        i += blockDim.x * gridDim.x)
@@ -69,6 +71,58 @@ __device__ __host__ inline T ldg(const T* address) {
 #endif
 }
 
+// CUDA provides atomic ops, but not for all types.  We provide wrappers
+// for some ops and provide implementation for all reasonable types.
+#define CUDA_ATOMIC_WRAPPER(op, T)                                      \
+  __device__ __forceinline__ T CudaAtomic##op(T* address, T val)
+
+#define USE_CUDA_ATOMIC(op, T)       \
+  CUDA_ATOMIC_WRAPPER(op, T) {       \
+    return atomic##op(address, val); \
+  }
+
+// For atomicAdd.
+USE_CUDA_ATOMIC(Add, int32);
+USE_CUDA_ATOMIC(Add, uint32);
+USE_CUDA_ATOMIC(Add, uint64);
+USE_CUDA_ATOMIC(Add, float);
+
+// Custom implementation of atomicAdd for double.
+// This implementation is copied from CUDA manual.
+CUDA_ATOMIC_WRAPPER(Add, double) {
+  uint64* address_as_ull = (uint64*)address;
+  uint64 old = *address_as_ull, assumed;
+
+  do {
+    assumed = old;
+    old = atomicCAS(address_as_ull, assumed,
+                    __double_as_longlong(val + __longlong_as_double(assumed)));
+
+    // Note: uses integer comparison to avoid hang in case of NaN
+  } while (assumed != old);
+
+  return __longlong_as_double(old);
+}
+
+// For atomicSub.
+
+// Custom implementation for sub by just negating the value.
+#define WRAPPED_ATOMIC_SUB(T)                       \
+  CUDA_ATOMIC_WRAPPER(Sub, T) {                     \
+    return CudaAtomicAdd(address, -val);            \
+  }
+
+WRAPPED_ATOMIC_SUB(uint64);
+WRAPPED_ATOMIC_SUB(int32);
+WRAPPED_ATOMIC_SUB(uint32);
+WRAPPED_ATOMIC_SUB(float);
+WRAPPED_ATOMIC_SUB(double);
+
+#undef WRAPPED_ATOMIC_SUB
+
+#undef USE_CUDA_ATOMIC
+#undef CUDA_ATOMIC_WRAPPER
+
 }  // namespace tensorflow
 
 #endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/util/events_writer.cc b/tensorflow/core/util/events_writer.cc
index cfdbb07cd5..6b2a526834 100644
--- a/tensorflow/core/util/events_writer.cc
+++ b/tensorflow/core/util/events_writer.cc
@@ -115,7 +115,7 @@ bool EventsWriter::Flush() {
   // recordio_writer_->Sync() can return true even if the underlying
   // file has been deleted.  EventWriter.FileDeletionBeforeWriting
   // demonstrates this and will fail if the FileHasDisappeared()
-  // conditon is removed.
+  // condition is removed.
   // Also, we deliberately attempt to Sync() before checking for a
   // disappearing file, in case for some file system File::Exists() is
   // false after File::Open() but before File::Sync().
diff --git a/tensorflow/examples/tutorials/mnist/input_data.py b/tensorflow/examples/tutorials/mnist/input_data.py
index 8493e257a3..20affa9dae 100644
--- a/tensorflow/examples/tutorials/mnist/input_data.py
+++ b/tensorflow/examples/tutorials/mnist/input_data.py
@@ -69,7 +69,7 @@ def extract_images(filename):
     return data
 
 
-def dense_to_one_hot(labels_dense, num_classes=10):
+def dense_to_one_hot(labels_dense, num_classes):
   """Convert class labels from scalars to one-hot vectors."""
   num_labels = labels_dense.shape[0]
   index_offset = numpy.arange(num_labels) * num_classes
@@ -78,7 +78,7 @@ def dense_to_one_hot(labels_dense, num_classes=10):
   return labels_one_hot
 
 
-def extract_labels(filename, one_hot=False):
+def extract_labels(filename, one_hot=False, num_classes=10):
   """Extract the labels into a 1D uint8 numpy array [index]."""
   print('Extracting', filename)
   with tf.gfile.Open(filename) as f, gzip.GzipFile(fileobj=f) as bytestream:
@@ -91,7 +91,7 @@ def extract_labels(filename, one_hot=False):
     buf = bytestream.read(num_items)
     labels = numpy.frombuffer(buf, dtype=numpy.uint8)
     if one_hot:
-      return dense_to_one_hot(labels)
+      return dense_to_one_hot(labels, num_classes)
     return labels
 
 
diff --git a/tensorflow/examples/udacity/2_fullyconnected.ipynb b/tensorflow/examples/udacity/2_fullyconnected.ipynb
index 2bf5a7f937..c8815f631b 100644
--- a/tensorflow/examples/udacity/2_fullyconnected.ipynb
+++ b/tensorflow/examples/udacity/2_fullyconnected.ipynb
@@ -61,7 +61,7 @@
         "colab_type": "text"
       },
       "source": [
-        "First reload the data we generated in `1_notmist.ipynb`."
+        "First reload the data we generated in `1_notmnist.ipynb`."
       ]
     },
     {
@@ -583,4 +583,4 @@
       ]
     }
   ]
-}
-\ No newline at end of file
+}
diff --git a/tensorflow/examples/udacity/6_lstm.ipynb b/tensorflow/examples/udacity/6_lstm.ipynb
index a1ef14b787..75a7027784 100644
--- a/tensorflow/examples/udacity/6_lstm.ipynb
+++ b/tensorflow/examples/udacity/6_lstm.ipynb
@@ -410,7 +410,7 @@
         "\n",
         "def characters(probabilities):\n",
         "  \"\"\"Turn a 1-hot encoding or a probability distribution over the possible\n",
-        "  characters back into its (mostl likely) character representation.\"\"\"\n",
+        "  characters back into its (most likely) character representation.\"\"\"\n",
         "  return [id2char(c) for c in np.argmax(probabilities, 1)]\n",
         "\n",
         "def batches2string(batches):\n",
diff --git a/tensorflow/g3doc/api_docs/cc/ClassEnv.md b/tensorflow/g3doc/api_docs/cc/ClassEnv.md
index 9dc957cd38..1bea893187 100644
--- a/tensorflow/g3doc/api_docs/cc/ClassEnv.md
+++ b/tensorflow/g3doc/api_docs/cc/ClassEnv.md
@@ -1,4 +1,4 @@
-# Class `tensorflow::Env`
+# `class tensorflow::Env`
 
 An interface used by the tensorflow implementation to access operating system functionality like the filesystem etc.
 
@@ -6,44 +6,7 @@ Callers may wish to provide a custom Env object to get fine grain control.
 
 All Env implementations are safe for concurrent access from multiple threads without any external synchronization.
 
-##Member Summary
-
-* [`tensorflow::Env::Env()`](#tensorflow_Env_Env)
-* [`tensorflow::Env::~Env()`](#tensorflow_Env_Env)
-* [`virtual Status tensorflow::Env::NewRandomAccessFile(const string &fname, RandomAccessFile **result)=0`](#virtual_Status_tensorflow_Env_NewRandomAccessFile)
-  * Creates a brand new random access read-only file with the specified name.
-* [`virtual Status tensorflow::Env::NewWritableFile(const string &fname, WritableFile **result)=0`](#virtual_Status_tensorflow_Env_NewWritableFile)
-  * Creates an object that writes to a new file with the specified name.
-* [`virtual Status tensorflow::Env::NewAppendableFile(const string &fname, WritableFile **result)=0`](#virtual_Status_tensorflow_Env_NewAppendableFile)
-  * Creates an object that either appends to an existing file, or writes to a new file (if the file does not exist to begin with).
-* [`virtual bool tensorflow::Env::FileExists(const string &fname)=0`](#virtual_bool_tensorflow_Env_FileExists)
-  * Returns true iff the named file exists.
-* [`virtual Status tensorflow::Env::GetChildren(const string &dir, std::vector< string > *result)=0`](#virtual_Status_tensorflow_Env_GetChildren)
-  * Stores in *result the names of the children of the specified directory. The names are relative to "dir".
-* [`virtual Status tensorflow::Env::DeleteFile(const string &fname)=0`](#virtual_Status_tensorflow_Env_DeleteFile)
-  * Deletes the named file.
-* [`virtual Status tensorflow::Env::CreateDir(const string &dirname)=0`](#virtual_Status_tensorflow_Env_CreateDir)
-  * Creates the specified directory.
-* [`virtual Status tensorflow::Env::DeleteDir(const string &dirname)=0`](#virtual_Status_tensorflow_Env_DeleteDir)
-  * Deletes the specified directory.
-* [`virtual Status tensorflow::Env::GetFileSize(const string &fname, uint64 *file_size)=0`](#virtual_Status_tensorflow_Env_GetFileSize)
-  * Stores the size of `fname` in `*file_size`.
-* [`virtual Status tensorflow::Env::RenameFile(const string &src, const string &target)=0`](#virtual_Status_tensorflow_Env_RenameFile)
-  * Renames file src to target. If target already exists, it will be replaced.
-* [`virtual uint64 tensorflow::Env::NowMicros()=0`](#virtual_uint64_tensorflow_Env_NowMicros)
-  * Returns the number of micro-seconds since some fixed point in time. Only useful for computing deltas of time.
-* [`virtual void tensorflow::Env::SleepForMicroseconds(int micros)=0`](#virtual_void_tensorflow_Env_SleepForMicroseconds)
-  * Sleeps/delays the thread for the prescribed number of micro-seconds.
-* [`virtual Thread* tensorflow::Env::StartThread(const ThreadOptions &thread_options, const string &name, std::function< void()> fn) TF_MUST_USE_RESULT=0`](#virtual_Thread_tensorflow_Env_StartThread)
-  * Returns a new thread that is running fn() and is identified (for debugging/performance-analysis) by "name".
-* [`virtual void tensorflow::Env::SchedClosure(std::function< void()> closure)=0`](#virtual_void_tensorflow_Env_SchedClosure)
-* [`virtual void tensorflow::Env::SchedClosureAfter(int micros, std::function< void()> closure)=0`](#virtual_void_tensorflow_Env_SchedClosureAfter)
-* [`virtual Status tensorflow::Env::LoadLibrary(const char *library_filename, void **handle)=0`](#virtual_Status_tensorflow_Env_LoadLibrary)
-* [`virtual Status tensorflow::Env::GetSymbolFromLibrary(void *handle, const char *symbol_name, void **symbol)=0`](#virtual_Status_tensorflow_Env_GetSymbolFromLibrary)
-* [`static Env* tensorflow::Env::Default()`](#static_Env_tensorflow_Env_Default)
-  * Returns a default environment suitable for the current operating system.
-
-##Member Details
+###Member Details
 
 #### `tensorflow::Env::Env()` {#tensorflow_Env_Env}
 
diff --git a/tensorflow/g3doc/api_docs/cc/ClassEnvWrapper.md b/tensorflow/g3doc/api_docs/cc/ClassEnvWrapper.md
index db060094ed..aab4d735c5 100644
--- a/tensorflow/g3doc/api_docs/cc/ClassEnvWrapper.md
+++ b/tensorflow/g3doc/api_docs/cc/ClassEnvWrapper.md
@@ -1,48 +1,10 @@
-# Class `tensorflow::EnvWrapper`
+# `class tensorflow::EnvWrapper`
 
 An implementation of Env that forwards all calls to another Env .
 
 May be useful to clients who wish to override just part of the functionality of another Env .
 
-##Member Summary
-
-* [`tensorflow::EnvWrapper::EnvWrapper(Env *t)`](#tensorflow_EnvWrapper_EnvWrapper)
-  * Initializes an EnvWrapper that delegates all calls to *t.
-* [`tensorflow::EnvWrapper::~EnvWrapper()`](#tensorflow_EnvWrapper_EnvWrapper)
-* [`Env* tensorflow::EnvWrapper::target() const`](#Env_tensorflow_EnvWrapper_target)
-  * Returns the target to which this Env forwards all calls.
-* [`Status tensorflow::EnvWrapper::NewRandomAccessFile(const string &f, RandomAccessFile **r) override`](#Status_tensorflow_EnvWrapper_NewRandomAccessFile)
-  * Creates a brand new random access read-only file with the specified name.
-* [`Status tensorflow::EnvWrapper::NewWritableFile(const string &f, WritableFile **r) override`](#Status_tensorflow_EnvWrapper_NewWritableFile)
-  * Creates an object that writes to a new file with the specified name.
-* [`Status tensorflow::EnvWrapper::NewAppendableFile(const string &f, WritableFile **r) override`](#Status_tensorflow_EnvWrapper_NewAppendableFile)
-  * Creates an object that either appends to an existing file, or writes to a new file (if the file does not exist to begin with).
-* [`bool tensorflow::EnvWrapper::FileExists(const string &f) override`](#bool_tensorflow_EnvWrapper_FileExists)
-  * Returns true iff the named file exists.
-* [`Status tensorflow::EnvWrapper::GetChildren(const string &dir, std::vector< string > *r) override`](#Status_tensorflow_EnvWrapper_GetChildren)
-  * Stores in *result the names of the children of the specified directory. The names are relative to "dir".
-* [`Status tensorflow::EnvWrapper::DeleteFile(const string &f) override`](#Status_tensorflow_EnvWrapper_DeleteFile)
-  * Deletes the named file.
-* [`Status tensorflow::EnvWrapper::CreateDir(const string &d) override`](#Status_tensorflow_EnvWrapper_CreateDir)
-  * Creates the specified directory.
-* [`Status tensorflow::EnvWrapper::DeleteDir(const string &d) override`](#Status_tensorflow_EnvWrapper_DeleteDir)
-  * Deletes the specified directory.
-* [`Status tensorflow::EnvWrapper::GetFileSize(const string &f, uint64 *s) override`](#Status_tensorflow_EnvWrapper_GetFileSize)
-  * Stores the size of `fname` in `*file_size`.
-* [`Status tensorflow::EnvWrapper::RenameFile(const string &s, const string &t) override`](#Status_tensorflow_EnvWrapper_RenameFile)
-  * Renames file src to target. If target already exists, it will be replaced.
-* [`uint64 tensorflow::EnvWrapper::NowMicros() override`](#uint64_tensorflow_EnvWrapper_NowMicros)
-  * Returns the number of micro-seconds since some fixed point in time. Only useful for computing deltas of time.
-* [`void tensorflow::EnvWrapper::SleepForMicroseconds(int micros) override`](#void_tensorflow_EnvWrapper_SleepForMicroseconds)
-  * Sleeps/delays the thread for the prescribed number of micro-seconds.
-* [`Thread* tensorflow::EnvWrapper::StartThread(const ThreadOptions &thread_options, const string &name, std::function< void()> fn) override`](#Thread_tensorflow_EnvWrapper_StartThread)
-  * Returns a new thread that is running fn() and is identified (for debugging/performance-analysis) by "name".
-* [`void tensorflow::EnvWrapper::SchedClosure(std::function< void()> closure) override`](#void_tensorflow_EnvWrapper_SchedClosure)
-* [`void tensorflow::EnvWrapper::SchedClosureAfter(int micros, std::function< void()> closure) override`](#void_tensorflow_EnvWrapper_SchedClosureAfter)
-* [`Status tensorflow::EnvWrapper::LoadLibrary(const char *library_filename, void **handle) override`](#Status_tensorflow_EnvWrapper_LoadLibrary)
-* [`Status tensorflow::EnvWrapper::GetSymbolFromLibrary(void *handle, const char *symbol_name, void **symbol) override`](#Status_tensorflow_EnvWrapper_GetSymbolFromLibrary)
-
-##Member Details
+###Member Details
 
 #### `tensorflow::EnvWrapper::EnvWrapper(Env *t)` {#tensorflow_EnvWrapper_EnvWrapper}
 
diff --git a/tensorflow/g3doc/api_docs/cc/ClassPartialTensorShape.md b/tensorflow/g3doc/api_docs/cc/ClassPartialTensorShape.md
index 005250c6ad..b9afae0152 100644
--- a/tensorflow/g3doc/api_docs/cc/ClassPartialTensorShape.md
+++ b/tensorflow/g3doc/api_docs/cc/ClassPartialTensorShape.md
@@ -1,42 +1,16 @@
-# Class `tensorflow::PartialTensorShape`
+# `class tensorflow::PartialTensorShape`
 
 Manages the partially known dimensions of a Tensor and their sizes.
 
 
 
-##Member Summary
-
-* [`tensorflow::PartialTensorShape::PartialTensorShape(gtl::ArraySlice< int64 > dim_sizes)`](#tensorflow_PartialTensorShape_PartialTensorShape)
-  * Construct a ` PartialTensorShape ` from the provided sizes. REQUIRES: `dim_sizes[i] >= 0`
-* [`tensorflow::PartialTensorShape::PartialTensorShape(std::initializer_list< int64 > dim_sizes)`](#tensorflow_PartialTensorShape_PartialTensorShape)
-* [`tensorflow::PartialTensorShape::PartialTensorShape(const TensorShapeProto &proto)`](#tensorflow_PartialTensorShape_PartialTensorShape)
-  * REQUIRES: `IsValid(proto)`
-* [`PartialTensorShape tensorflow::PartialTensorShape::Concatenate(int64 size) const`](#PartialTensorShape_tensorflow_PartialTensorShape_Concatenate)
-* [`PartialTensorShape tensorflow::PartialTensorShape::Concatenate(const PartialTensorShape &shape) const`](#PartialTensorShape_tensorflow_PartialTensorShape_Concatenate)
-* [`Status tensorflow::PartialTensorShape::MergeWith(const PartialTensorShape &shape, PartialTensorShape *result) const`](#Status_tensorflow_PartialTensorShape_MergeWith)
-* [`int tensorflow::PartialTensorShape::dims() const`](#int_tensorflow_PartialTensorShape_dims)
-  * Return the number of dimensions in the tensor.
-* [`bool tensorflow::PartialTensorShape::IsFullyDefined() const`](#bool_tensorflow_PartialTensorShape_IsFullyDefined)
-  * Return true iff the rank and all of the dimensions are well defined.
-* [`bool tensorflow::PartialTensorShape::IsCompatibleWith(const PartialTensorShape &shape) const`](#bool_tensorflow_PartialTensorShape_IsCompatibleWith)
-* [`bool tensorflow::PartialTensorShape::IsCompatibleWith(const TensorShape &shape) const`](#bool_tensorflow_PartialTensorShape_IsCompatibleWith)
-* [`int64 tensorflow::PartialTensorShape::dim_size(int d) const`](#int64_tensorflow_PartialTensorShape_dim_size)
-  * Returns the number of elements in dimension `d`. REQUIRES: `0 <= d < dims() `
-* [`gtl::ArraySlice<int64> tensorflow::PartialTensorShape::dim_sizes() const`](#gtl_ArraySlice_int64_tensorflow_PartialTensorShape_dim_sizes)
-  * Returns sizes of all dimensions.
-* [`void tensorflow::PartialTensorShape::AsProto(TensorShapeProto *proto) const`](#void_tensorflow_PartialTensorShape_AsProto)
-  * Fill `*proto` from `*this`.
-* [`bool tensorflow::PartialTensorShape::AsTensorShape(TensorShape *tensor_shape) const`](#bool_tensorflow_PartialTensorShape_AsTensorShape)
-* [`string tensorflow::PartialTensorShape::DebugString() const`](#string_tensorflow_PartialTensorShape_DebugString)
-  * For error messages.
-* [`bool tensorflow::PartialTensorShape::IsValid(const TensorShapeProto &proto)`](#bool_tensorflow_PartialTensorShape_IsValid)
-  * Returns `true` iff `proto` is a valid partial tensor shape.
-* [`Status tensorflow::PartialTensorShape::IsValidShape(const TensorShapeProto &proto)`](#Status_tensorflow_PartialTensorShape_IsValidShape)
-* [`string tensorflow::PartialTensorShape::DebugString(const TensorShapeProto &proto)`](#string_tensorflow_PartialTensorShape_DebugString)
-* [`Status tensorflow::PartialTensorShape::MakePartialShape(const T *dims, int n, PartialTensorShape *out)`](#Status_tensorflow_PartialTensorShape_MakePartialShape)
-  * Returns a ` PartialTensorShape ` whose dimensions are `dims[0]`, `dims[1]`, ..., `dims[n-1]`. Values of -1 are considered "unknown".
-
-##Member Details
+###Member Details
+
+#### `tensorflow::PartialTensorShape::PartialTensorShape()` {#tensorflow_PartialTensorShape_PartialTensorShape}
+
+Construct an unknown ` PartialTensorShape `.
+
+
 
 #### `tensorflow::PartialTensorShape::PartialTensorShape(gtl::ArraySlice< int64 > dim_sizes)` {#tensorflow_PartialTensorShape_PartialTensorShape}
 
@@ -76,9 +50,9 @@ Merges all the dimensions from `shape`. Returns `InvalidArgument` error if eithe
 
 #### `int tensorflow::PartialTensorShape::dims() const` {#int_tensorflow_PartialTensorShape_dims}
 
-Return the number of dimensions in the tensor.
 
 
+Return the number of dimensions in the tensor. If the number of dimensions is unknown, return -1.
 
 #### `bool tensorflow::PartialTensorShape::IsFullyDefined() const` {#bool_tensorflow_PartialTensorShape_IsFullyDefined}
 
diff --git a/tensorflow/g3doc/api_docs/cc/ClassPartialTensorShapeUtils.md b/tensorflow/g3doc/api_docs/cc/ClassPartialTensorShapeUtils.md
index 484a9c2094..18e30f7f1d 100644
--- a/tensorflow/g3doc/api_docs/cc/ClassPartialTensorShapeUtils.md
+++ b/tensorflow/g3doc/api_docs/cc/ClassPartialTensorShapeUtils.md
@@ -1,15 +1,10 @@
-# Class `tensorflow::PartialTensorShapeUtils`
+# `class tensorflow::PartialTensorShapeUtils`
 
 Static helper routines for ` PartialTensorShape `. Includes a few common predicates on a partially known tensor shape.
 
 
 
-##Member Summary
-
-* [`static string tensorflow::PartialTensorShapeUtils::PartialShapeListString(const gtl::ArraySlice< PartialTensorShape > &shapes)`](#static_string_tensorflow_PartialTensorShapeUtils_PartialShapeListString)
-* [`static bool tensorflow::PartialTensorShapeUtils::AreCompatible(const gtl::ArraySlice< PartialTensorShape > &shapes0, const gtl::ArraySlice< PartialTensorShape > &shapes1)`](#static_bool_tensorflow_PartialTensorShapeUtils_AreCompatible)
-
-##Member Details
+###Member Details
 
 #### `static string tensorflow::PartialTensorShapeUtils::PartialShapeListString(const gtl::ArraySlice< PartialTensorShape > &shapes)` {#static_string_tensorflow_PartialTensorShapeUtils_PartialShapeListString}
 
diff --git a/tensorflow/g3doc/api_docs/cc/ClassRandomAccessFile.md b/tensorflow/g3doc/api_docs/cc/ClassRandomAccessFile.md
index 6b97b02571..1ff484c083 100644
--- a/tensorflow/g3doc/api_docs/cc/ClassRandomAccessFile.md
+++ b/tensorflow/g3doc/api_docs/cc/ClassRandomAccessFile.md
@@ -1,17 +1,10 @@
-# Class `tensorflow::RandomAccessFile`
+# `class tensorflow::RandomAccessFile`
 
 A file abstraction for randomly reading the contents of a file.
 
 
 
-##Member Summary
-
-* [`tensorflow::RandomAccessFile::RandomAccessFile()`](#tensorflow_RandomAccessFile_RandomAccessFile)
-* [`tensorflow::RandomAccessFile::~RandomAccessFile()`](#tensorflow_RandomAccessFile_RandomAccessFile)
-* [`virtual Status tensorflow::RandomAccessFile::Read(uint64 offset, size_t n, StringPiece *result, char *scratch) const =0`](#virtual_Status_tensorflow_RandomAccessFile_Read)
-  * Reads up to `n` bytes from the file starting at `offset`.
-
-##Member Details
+###Member Details
 
 #### `tensorflow::RandomAccessFile::RandomAccessFile()` {#tensorflow_RandomAccessFile_RandomAccessFile}
 
@@ -25,7 +18,7 @@ A file abstraction for randomly reading the contents of a file.
 
 
 
-#### `virtual Status tensorflow::RandomAccessFile::Read(uint64 offset, size_t n, StringPiece *result, char *scratch) const =0` {#virtual_Status_tensorflow_RandomAccessFile_Read}
+#### `virtual Status tensorflow::RandomAccessFile::Read(uint64 offset, size_t n, StringPiece *result, char *scratch) const  =0` {#virtual_Status_tensorflow_RandomAccessFile_Read}
 
 Reads up to `n` bytes from the file starting at `offset`.
 
diff --git a/tensorflow/g3doc/api_docs/cc/ClassSession.md b/tensorflow/g3doc/api_docs/cc/ClassSession.md
index ffe51ca310..c4a7e9bb02 100644
--- a/tensorflow/g3doc/api_docs/cc/ClassSession.md
+++ b/tensorflow/g3doc/api_docs/cc/ClassSession.md
@@ -1,4 +1,4 @@
-# Class `tensorflow::Session`
+# `class tensorflow::Session`
 
 A Session instance lets a caller drive a TensorFlow graph computation.
 
@@ -6,54 +6,13 @@ When a Session is created with a given target, a new Session object is bound to
 
 Example:
 
-```c++ tensorflow::GraphDef graph;
-// ... Create or load graph into "graph".
-
-// This example uses the default options which connects
-// to a local runtime.
-tensorflow::SessionOptions options;
-std::unique_ptr<tensorflow::Session>
-session(tensorflow::NewSession(options));
-
-// Create the session with this graph.
-tensorflow::Status s = session->Create(graph);
-if (!s.ok()) { ... }
-
-// Run the graph and fetch the first output of the "output"
-// operation, and also run to but do not return anything
-// for the "update_state" operation.
-std::vector<tensorflow::Tensor> outputs;
-s = session->Run({}, {"output:0"}, {"update_state"}, &outputs);
-if (!s.ok()) { ... }
-
-// Map the output as a flattened float tensor, and do something
-// with it.
-auto output_tensor = outputs[0].flat<float>();
-if (output_tensor(0) > 0.5) { ... }
-
-// Close the session to release the resources associated with
-// this session.
-session->Close();
-
-```
+{c++}   tensorflow::GraphDef graph;  // ... Create or load graph into "graph".   // This example uses the default options which connects  // to a local runtime.  tensorflow::SessionOptions options;  std::unique_ptr<tensorflow::Session>  session(tensorflow::NewSession(options));   // Create the session with this graph.  tensorflow::Status s = session->Create(graph);  if (!s.ok()) { ... }   // Run the graph and fetch the first output of the "output"  // operation, and also run to but do not return anything  // for the "update_state" operation.  std::vector<tensorflow::Tensor> outputs;  s = session->Run({}, {"output:0"}, {"update_state"}, &outputs);  if (!s.ok()) { ... }   // Map the output as a flattened float tensor, and do something  // with it.  auto output_tensor = outputs[0].flat<float>();  if (output_tensor(0) > 0.5) { ... }   // Close the session to release the resources associated with  // this session.  session->Close();
 
 A Session allows concurrent calls to Run() , though a Session must be created / extended by a single thread.
 
 Only one thread must call Close() , and Close() must only be called after all other calls to Run() have returned.
 
-##Member Summary
-
-* [`virtual Status tensorflow::Session::Create(const GraphDef &graph)=0`](#virtual_Status_tensorflow_Session_Create)
-  * Create the graph to be used for the session.
-* [`virtual Status tensorflow::Session::Extend(const GraphDef &graph)=0`](#virtual_Status_tensorflow_Session_Extend)
-  * Adds operations to the graph that is already registered with the Session .
-* [`virtual Status tensorflow::Session::Run(const std::vector< std::pair< string, Tensor > > &inputs, const std::vector< string > &output_tensor_names, const std::vector< string > &target_node_names, std::vector< Tensor > *outputs)=0`](#virtual_Status_tensorflow_Session_Run)
-  * Runs the graph with the provided input tensors and fills `outputs` for the endpoints specified in `output_tensor_names`. Runs to but does not return Tensors for the nodes in `target_node_names`.
-* [`virtual Status tensorflow::Session::Close()=0`](#virtual_Status_tensorflow_Session_Close)
-  * Closes this session.
-* [`virtual tensorflow::Session::~Session()`](#virtual_tensorflow_Session_Session)
-
-##Member Details
+###Member Details
 
 #### `virtual Status tensorflow::Session::Create(const GraphDef &graph)=0` {#virtual_Status_tensorflow_Session_Create}
 
@@ -79,6 +38,18 @@ REQUIRES: The name of each Tensor of the input or output must match a "Tensor en
 
 REQUIRES: outputs is not nullptr if `output_tensor_names` is non-empty.
 
+#### `virtual Status tensorflow::Session::PRunSetup(const std::vector< string > &input_names, const std::vector< string > &output_names, const std::vector< string > &target_nodes, string *handle)` {#virtual_Status_tensorflow_Session_PRunSetup}
+
+Sets up a graph for partial execution. All future feeds and fetches are specified by &apos;input_names&apos; and &apos;output_names&apos;. Returns &apos;handle&apos; that can be used to perform a sequence of partial feeds and fetches. NOTE: This API is still experimental and may change.
+
+
+
+#### `virtual Status tensorflow::Session::PRun(const string &handle, const std::vector< std::pair< string, Tensor > > &inputs, const std::vector< string > &output_names, std::vector< Tensor > *outputs)` {#virtual_Status_tensorflow_Session_PRun}
+
+Continues the pending execution specified by &apos;handle&apos; with the provided input tensors and fills `outputs` for the endpoints specified in `output_names`. NOTE: This API is still experimental and may change.
+
+
+
 #### `virtual Status tensorflow::Session::Close()=0` {#virtual_Status_tensorflow_Session_Close}
 
 Closes this session.
diff --git a/tensorflow/g3doc/api_docs/cc/ClassStatus.md b/tensorflow/g3doc/api_docs/cc/ClassStatus.md
index 102ca2ea48..a5d332128b 100644
--- a/tensorflow/g3doc/api_docs/cc/ClassStatus.md
+++ b/tensorflow/g3doc/api_docs/cc/ClassStatus.md
@@ -1,32 +1,10 @@
-# Class `tensorflow::Status`
+# `class tensorflow::Status`
 
 
 
 
 
-##Member Summary
-
-* [`tensorflow::Status::Status()`](#tensorflow_Status_Status)
-  * Create a success status.
-* [`tensorflow::Status::~Status()`](#tensorflow_Status_Status)
-* [`tensorflow::Status::Status(tensorflow::error::Code code, tensorflow::StringPiece msg)`](#tensorflow_Status_Status)
-  * Create a status with the specified error code and msg as a human-readable string containing more detailed information.
-* [`tensorflow::Status::Status(const Status &s)`](#tensorflow_Status_Status)
-  * Copy the specified status.
-* [`void tensorflow::Status::operator=(const Status &s)`](#void_tensorflow_Status_operator_)
-* [`bool tensorflow::Status::ok() const`](#bool_tensorflow_Status_ok)
-  * Returns true iff the status indicates success.
-* [`tensorflow::error::Code tensorflow::Status::code() const`](#tensorflow_error_Code_tensorflow_Status_code)
-* [`const string& tensorflow::Status::error_message() const`](#const_string_tensorflow_Status_error_message)
-* [`bool tensorflow::Status::operator==(const Status &x) const`](#bool_tensorflow_Status_operator_)
-* [`bool tensorflow::Status::operator!=(const Status &x) const`](#bool_tensorflow_Status_operator_)
-* [`void tensorflow::Status::Update(const Status &new_status)`](#void_tensorflow_Status_Update)
-  * If ` ok() `, stores `new_status` into `*this`. If `!ok()`, preserves the current status, but may augment with additional information about `new_status`.
-* [`string tensorflow::Status::ToString() const`](#string_tensorflow_Status_ToString)
-  * Return a string representation of this status suitable for printing. Returns the string `"OK"` for success.
-* [`return tensorflow::Status::OK()`](#return_tensorflow_Status_OK)
-
-##Member Details
+###Member Details
 
 #### `tensorflow::Status::Status()` {#tensorflow_Status_Status}
 
diff --git a/tensorflow/g3doc/api_docs/cc/ClassTensor.md b/tensorflow/g3doc/api_docs/cc/ClassTensor.md
index 92765aa7ad..812d2354c6 100644
--- a/tensorflow/g3doc/api_docs/cc/ClassTensor.md
+++ b/tensorflow/g3doc/api_docs/cc/ClassTensor.md
@@ -1,86 +1,10 @@
-# Class `tensorflow::Tensor`
+# `class tensorflow::Tensor`
 
 Represents an n-dimensional array of values.
 
 
 
-##Member Summary
-
-* [`tensorflow::Tensor::Tensor()`](#tensorflow_Tensor_Tensor)
-  * Default Tensor constructor. Creates a 1-dimension, 0-element float tensor.
-* [`tensorflow::Tensor::Tensor(DataType type, const TensorShape &shape)`](#tensorflow_Tensor_Tensor)
-  * Creates a Tensor of the given `type` and `shape`.
-* [`tensorflow::Tensor::Tensor(Allocator *a, DataType type, const TensorShape &shape)`](#tensorflow_Tensor_Tensor)
-  * Creates a tensor with the input `type` and `shape`, using the allocator `a` to allocate the underlying buffer.
-* [`tensorflow::Tensor::Tensor(Allocator *a, DataType type, const TensorShape &shape, const AllocationAttributes &allocation_attr)`](#tensorflow_Tensor_Tensor)
-  * Creates a tensor with the input `type` and `shape`, using the allocator `a` and the specified "allocation_attr" to allocate the underlying buffer.
-* [`tensorflow::Tensor::Tensor(DataType type)`](#tensorflow_Tensor_Tensor)
-  * Creates an uninitialized Tensor of the given data type.
-* [`tensorflow::Tensor::Tensor(const Tensor &other)`](#tensorflow_Tensor_Tensor)
-* [`tensorflow::Tensor::~Tensor()`](#tensorflow_Tensor_Tensor)
-  * Copy constructor.
-* [`DataType tensorflow::Tensor::dtype() const`](#DataType_tensorflow_Tensor_dtype)
-  * Returns the data type.
-* [`const TensorShape& tensorflow::Tensor::shape() const`](#const_TensorShape_tensorflow_Tensor_shape)
-  * Returns the shape of the tensor.
-* [`int tensorflow::Tensor::dims() const`](#int_tensorflow_Tensor_dims)
-  * Convenience accessor for the tensor shape.
-* [`int64 tensorflow::Tensor::dim_size(int d) const`](#int64_tensorflow_Tensor_dim_size)
-  * Convenience accessor for the tensor shape.
-* [`int64 tensorflow::Tensor::NumElements() const`](#int64_tensorflow_Tensor_NumElements)
-  * Convenience accessor for the tensor shape.
-* [`bool tensorflow::Tensor::IsSameSize(const Tensor &b) const`](#bool_tensorflow_Tensor_IsSameSize)
-* [`bool tensorflow::Tensor::SharesBufferWith(const Tensor &b) const`](#bool_tensorflow_Tensor_SharesBufferWith)
-* [`size_t tensorflow::Tensor::BufferHash() const`](#size_t_tensorflow_Tensor_BufferHash)
-* [`bool tensorflow::Tensor::IsInitialized() const`](#bool_tensorflow_Tensor_IsInitialized)
-  * Has this Tensor been initialized?
-* [`size_t tensorflow::Tensor::TotalBytes() const`](#size_t_tensorflow_Tensor_TotalBytes)
-  * Returns the estimated memory usage of this tensor.
-* [`Tensor& tensorflow::Tensor::operator=(const Tensor &other)`](#Tensor_tensorflow_Tensor_operator_)
-  * Assign operator. This tensor shares other&apos;s underlying storage.
-* [`bool tensorflow::Tensor::CopyFrom(const Tensor &other, const TensorShape &shape) TF_MUST_USE_RESULT`](#bool_tensorflow_Tensor_CopyFrom)
-  * Copy the other tensor into this tensor and reshape it.
-* [`Tensor tensorflow::Tensor::Slice(int64 dim0_start, int64 dim0_limit) const`](#Tensor_tensorflow_Tensor_Slice)
-  * Slice this tensor along the 1st dimension.
-* [`bool tensorflow::Tensor::FromProto(const TensorProto &other) TF_MUST_USE_RESULT`](#bool_tensorflow_Tensor_FromProto)
-  * Parse `other` and construct the tensor.
-* [`bool tensorflow::Tensor::FromProto(Allocator *a, const TensorProto &other) TF_MUST_USE_RESULT`](#bool_tensorflow_Tensor_FromProto)
-* [`void tensorflow::Tensor::AsProtoField(TensorProto *proto) const`](#void_tensorflow_Tensor_AsProtoField)
-  * Fills in `proto` with `*this` tensor&apos;s content.
-* [`void tensorflow::Tensor::AsProtoTensorContent(TensorProto *proto) const`](#void_tensorflow_Tensor_AsProtoTensorContent)
-* [`TTypes<T>::Vec tensorflow::Tensor::vec()`](#TTypes_T_Vec_tensorflow_Tensor_vec)
-  * Return the tensor data as an `Eigen::Tensor` with the type and sizes of this ` Tensor `.
-* [`TTypes<T>::Matrix tensorflow::Tensor::matrix()`](#TTypes_T_Matrix_tensorflow_Tensor_matrix)
-* [`TTypes< T, NDIMS >::Tensor tensorflow::Tensor::tensor()`](#TTypes_T_NDIMS_Tensor_tensorflow_Tensor_tensor)
-* [`TTypes<T>::Flat tensorflow::Tensor::flat()`](#TTypes_T_Flat_tensorflow_Tensor_flat)
-  * Return the tensor data as an `Eigen::Tensor` of the data type and a specified shape.
-* [`TTypes<T>::UnalignedFlat tensorflow::Tensor::unaligned_flat()`](#TTypes_T_UnalignedFlat_tensorflow_Tensor_unaligned_flat)
-* [`TTypes<T>::Matrix tensorflow::Tensor::flat_inner_dims()`](#TTypes_T_Matrix_tensorflow_Tensor_flat_inner_dims)
-* [`TTypes<T>::Matrix tensorflow::Tensor::flat_outer_dims()`](#TTypes_T_Matrix_tensorflow_Tensor_flat_outer_dims)
-* [`TTypes< T, NDIMS >::Tensor tensorflow::Tensor::shaped(gtl::ArraySlice< int64 > new_sizes)`](#TTypes_T_NDIMS_Tensor_tensorflow_Tensor_shaped)
-* [`TTypes< T, NDIMS >::UnalignedTensor tensorflow::Tensor::unaligned_shaped(gtl::ArraySlice< int64 > new_sizes)`](#TTypes_T_NDIMS_UnalignedTensor_tensorflow_Tensor_unaligned_shaped)
-* [`TTypes< T >::Scalar tensorflow::Tensor::scalar()`](#TTypes_T_Scalar_tensorflow_Tensor_scalar)
-  * Return the Tensor data as a `TensorMap` of fixed size 1: `TensorMap<TensorFixedSize<T, 1>>`.
-* [`TTypes<T>::ConstVec tensorflow::Tensor::vec() const`](#TTypes_T_ConstVec_tensorflow_Tensor_vec)
-  * Const versions of all the methods above.
-* [`TTypes<T>::ConstMatrix tensorflow::Tensor::matrix() const`](#TTypes_T_ConstMatrix_tensorflow_Tensor_matrix)
-* [`TTypes< T, NDIMS >::ConstTensor tensorflow::Tensor::tensor() const`](#TTypes_T_NDIMS_ConstTensor_tensorflow_Tensor_tensor)
-* [`TTypes<T>::ConstFlat tensorflow::Tensor::flat() const`](#TTypes_T_ConstFlat_tensorflow_Tensor_flat)
-* [`TTypes<T>::UnalignedConstFlat tensorflow::Tensor::unaligned_flat() const`](#TTypes_T_UnalignedConstFlat_tensorflow_Tensor_unaligned_flat)
-* [`TTypes<T>::ConstMatrix tensorflow::Tensor::flat_inner_dims() const`](#TTypes_T_ConstMatrix_tensorflow_Tensor_flat_inner_dims)
-* [`TTypes<T>::ConstMatrix tensorflow::Tensor::flat_outer_dims() const`](#TTypes_T_ConstMatrix_tensorflow_Tensor_flat_outer_dims)
-* [`TTypes< T, NDIMS >::ConstTensor tensorflow::Tensor::shaped(gtl::ArraySlice< int64 > new_sizes) const`](#TTypes_T_NDIMS_ConstTensor_tensorflow_Tensor_shaped)
-* [`TTypes< T, NDIMS >::UnalignedConstTensor tensorflow::Tensor::unaligned_shaped(gtl::ArraySlice< int64 > new_sizes) const`](#TTypes_T_NDIMS_UnalignedConstTensor_tensorflow_Tensor_unaligned_shaped)
-* [`TTypes< T >::ConstScalar tensorflow::Tensor::scalar() const`](#TTypes_T_ConstScalar_tensorflow_Tensor_scalar)
-* [`string tensorflow::Tensor::SummarizeValue(int64 max_entries) const`](#string_tensorflow_Tensor_SummarizeValue)
-  * Render the first `max_entries` values in `*this` into a string.
-* [`string tensorflow::Tensor::DebugString() const`](#string_tensorflow_Tensor_DebugString)
-  * A human-readable summary of the tensor suitable for debugging.
-* [`void tensorflow::Tensor::FillDescription(TensorDescription *description) const`](#void_tensorflow_Tensor_FillDescription)
-* [`StringPiece tensorflow::Tensor::tensor_data() const`](#StringPiece_tensorflow_Tensor_tensor_data)
-  * Returns a ` StringPiece ` mapping the current tensor&apos;s buffer.
-
-##Member Details
+###Member Details
 
 #### `tensorflow::Tensor::Tensor()` {#tensorflow_Tensor_Tensor}
 
@@ -140,7 +64,7 @@ Returns the shape of the tensor.
 
 Convenience accessor for the tensor shape.
 
-For all shape accessors, see comments for relevant methods of ` TensorShape ` in `tensor_shape.h`.
+For all shape accessors, see comments for relevant methods of ` TensorShape ` in ` tensor_shape.h `.
 
 #### `int64 tensorflow::Tensor::dim_size(int d) const` {#int64_tensorflow_Tensor_dim_size}
 
@@ -238,15 +162,7 @@ Use these methods when you know the data type and the number of dimensions of th
 
 Example:
 
-```c++ typedef float T;
-Tensor my_mat(...built with Shape{rows: 3, cols: 5}...);
-auto mat = my_mat.matrix<T>();    // 2D Eigen::Tensor, 3 x 5.
-auto mat = my_mat.tensor<T, 2>(); // 2D Eigen::Tensor, 3 x 5.
-auto vec = my_mat.vec<T>();       // CHECK fails as my_mat is 2D.
-auto vec = my_mat.tensor<T, 3>(); // CHECK fails as my_mat is 2D.
-auto mat = my_mat.matrix<int32>();// CHECK fails as type mismatch.
-
-```
+{c++}   typedef float T;  Tensor my_mat(...built with Shape{rows: 3, cols: 5}...);  auto mat = my_mat.matrix<T>(); // 2D Eigen::Tensor, 3 x 5.  auto mat = my_mat.tensor<T, 2>(); // 2D Eigen::Tensor, 3 x 5.  auto vec = my_mat.vec<T>(); // CHECK fails as my_mat is 2D.  auto vec = my_mat.tensor<T, 3>(); // CHECK fails as my_mat is 2D.  auto mat = my_mat.matrix<int32>();// CHECK fails as type mismatch.
 
 #### `TTypes<T>::Matrix tensorflow::Tensor::matrix()` {#TTypes_T_Matrix_tensorflow_Tensor_matrix}
 
@@ -268,22 +184,7 @@ These methods allow you to access the data with the dimensions and sizes of your
 
 Example:
 
-```c++ typedef float T;
-Tensor my_ten(...built with Shape{planes: 4, rows: 3, cols: 5}...);
-// 1D Eigen::Tensor, size 60:
-auto flat = my_ten.flat<T>();
-// 2D Eigen::Tensor 12 x 5:
-auto inner = my_ten.flat_inner_dims<T>();
-// 2D Eigen::Tensor 4 x 15:
-auto outer = my_ten.shaped<T, 2>({4, 15});
-// CHECK fails, bad num elements:
-auto outer = my_ten.shaped<T, 2>({4, 8});
-// 3D Eigen::Tensor 6 x 5 x 2:
-auto weird = my_ten.shaped<T, 3>({6, 5, 2});
-// CHECK fails, type mismatch:
-auto bad   = my_ten.flat<int32>();
-
-```
+{c++}   typedef float T;  Tensor my_ten(...built with Shape{planes: 4, rows: 3, cols: 5}...);  // 1D Eigen::Tensor, size 60:  auto flat = my_ten.flat<T>();  // 2D Eigen::Tensor 12 x 5:  auto inner = my_ten.flat_inner_dims<T>();  // 2D Eigen::Tensor 4 x 15:  auto outer = my_ten.shaped<T, 2>({4, 15});  // CHECK fails, bad num elements:  auto outer = my_ten.shaped<T, 2>({4, 8});  // 3D Eigen::Tensor 6 x 5 x 2:  auto weird = my_ten.shaped<T, 3>({6, 5, 2});  // CHECK fails, type mismatch:  auto bad = my_ten.flat<int32>();
 
 #### `TTypes<T>::UnalignedFlat tensorflow::Tensor::unaligned_flat()` {#TTypes_T_UnalignedFlat_tensorflow_Tensor_unaligned_flat}
 
@@ -407,4 +308,4 @@ The returned ` StringPiece ` may point to memory location on devices that the CP
 
 NOTE: The underlying tensor buffer is refcounted, so the lifetime of the contents mapped by the ` StringPiece ` matches the lifetime of the buffer; callers should arrange to make sure the buffer does not get destroyed while the ` StringPiece ` is still used.
 
-REQUIRES: `DataTypeCanUseMemcpy( dtype() )`.
+REQUIRES: `DataTypeCanUseMemcpy(dtype())`.
diff --git a/tensorflow/g3doc/api_docs/cc/ClassTensorShape.md b/tensorflow/g3doc/api_docs/cc/ClassTensorShape.md
index 23e7c40397..def4232b9f 100644
--- a/tensorflow/g3doc/api_docs/cc/ClassTensorShape.md
+++ b/tensorflow/g3doc/api_docs/cc/ClassTensorShape.md
@@ -1,57 +1,22 @@
-# Class `tensorflow::TensorShape`
-
-Manages the dimensions of a Tensor and their sizes.
-
-
-
-##Member Summary
-
-* [`tensorflow::TensorShape::TensorShape(gtl::ArraySlice< int64 > dim_sizes)`](#tensorflow_TensorShape_TensorShape)
-  * Construct a ` TensorShape ` from the provided sizes. REQUIRES: `dim_sizes[i] >= 0`
-* [`tensorflow::TensorShape::TensorShape(std::initializer_list< int64 > dim_sizes)`](#tensorflow_TensorShape_TensorShape)
-* [`tensorflow::TensorShape::TensorShape(const TensorShapeProto &proto)`](#tensorflow_TensorShape_TensorShape)
-  * REQUIRES: `IsValid(proto)`
-* [`tensorflow::TensorShape::TensorShape()`](#tensorflow_TensorShape_TensorShape)
-* [`void tensorflow::TensorShape::Clear()`](#void_tensorflow_TensorShape_Clear)
-  * Clear a tensor shape.
-* [`void tensorflow::TensorShape::AddDim(int64 size)`](#void_tensorflow_TensorShape_AddDim)
-  * Add a dimension to the end ("inner-most"). REQUIRES: `size >= 0`
-* [`void tensorflow::TensorShape::AppendShape(const TensorShape &shape)`](#void_tensorflow_TensorShape_AppendShape)
-  * Appends all the dimensions from `shape`.
-* [`void tensorflow::TensorShape::InsertDim(int d, int64 size)`](#void_tensorflow_TensorShape_InsertDim)
-  * Insert a dimension somewhere in the ` TensorShape `. REQUIRES: `0 <= d <= dims() ` REQUIRES: `size >= 0`
-* [`void tensorflow::TensorShape::set_dim(int d, int64 size)`](#void_tensorflow_TensorShape_set_dim)
-  * Modifies the size of the dimension `d` to be `size` REQUIRES: `0 <= d < dims() ` REQUIRES: `size >= 0`
-* [`void tensorflow::TensorShape::RemoveDim(int d)`](#void_tensorflow_TensorShape_RemoveDim)
-  * Removes dimension `d` from the ` TensorShape `. REQUIRES: `0 <= d < dims() `
-* [`int tensorflow::TensorShape::dims() const`](#int_tensorflow_TensorShape_dims)
-  * Return the number of dimensions in the tensor.
-* [`int64 tensorflow::TensorShape::dim_size(int d) const`](#int64_tensorflow_TensorShape_dim_size)
-  * Returns the number of elements in dimension `d`. REQUIRES: `0 <= d < dims() `
-* [`gtl::ArraySlice<int64> tensorflow::TensorShape::dim_sizes() const`](#gtl_ArraySlice_int64_tensorflow_TensorShape_dim_sizes)
-  * Returns sizes of all dimensions.
-* [`int64 tensorflow::TensorShape::num_elements() const`](#int64_tensorflow_TensorShape_num_elements)
-  * Returns the number of elements in the tensor.
-* [`bool tensorflow::TensorShape::IsSameSize(const TensorShape &b) const`](#bool_tensorflow_TensorShape_IsSameSize)
-* [`bool tensorflow::TensorShape::operator==(const TensorShape &b) const`](#bool_tensorflow_TensorShape_operator_)
-* [`void tensorflow::TensorShape::AsProto(TensorShapeProto *proto) const`](#void_tensorflow_TensorShape_AsProto)
-  * Fill `*proto` from `*this`.
-* [`Eigen::DSizes< Eigen::DenseIndex, NDIMS > tensorflow::TensorShape::AsEigenDSizes() const`](#Eigen_DSizes_Eigen_DenseIndex_NDIMS_tensorflow_TensorShape_AsEigenDSizes)
-  * Fill `*dsizes` from `*this`.
-* [`Eigen::DSizes< Eigen::DenseIndex, NDIMS > tensorflow::TensorShape::AsEigenDSizesWithPadding() const`](#Eigen_DSizes_Eigen_DenseIndex_NDIMS_tensorflow_TensorShape_AsEigenDSizesWithPadding)
-* [`TensorShapeIter tensorflow::TensorShape::begin() const`](#TensorShapeIter_tensorflow_TensorShape_begin)
-  * For iterating through the dimensions.
-* [`TensorShapeIter tensorflow::TensorShape::end() const`](#TensorShapeIter_tensorflow_TensorShape_end)
-* [`string tensorflow::TensorShape::DebugString() const`](#string_tensorflow_TensorShape_DebugString)
-  * For error messages.
-* [`string tensorflow::TensorShape::ShortDebugString() const`](#string_tensorflow_TensorShape_ShortDebugString)
-  * Same as DebugString()
-* [`bool tensorflow::TensorShape::IsValid(const TensorShapeProto &proto)`](#bool_tensorflow_TensorShape_IsValid)
-  * Returns `true` iff `proto` is a valid tensor shape.
-* [`Status tensorflow::TensorShape::IsValidShape(const TensorShapeProto &proto)`](#Status_tensorflow_TensorShape_IsValidShape)
-* [`string tensorflow::TensorShape::ShortDebugString(const TensorShapeProto &proto)`](#string_tensorflow_TensorShape_ShortDebugString)
-
-##Member Details
+# `class tensorflow::TensorShape`
+
+
+
+
+
+###Member Details
+
+#### `uint8 tensorflow::TensorShape::buf[16][16]` {#uint8_tensorflow_TensorShape_buf_16_}
+
+
+
+
+
+#### `Rep64* tensorflow::TensorShape::unused_aligner` {#Rep64_tensorflow_TensorShape_unused_aligner}
+
+
+
+
 
 #### `tensorflow::TensorShape::TensorShape(gtl::ArraySlice< int64 > dim_sizes)` {#tensorflow_TensorShape_TensorShape}
 
@@ -77,6 +42,24 @@ REQUIRES: `IsValid(proto)`
 
 Create a tensor shape with no dimensions and one element, which you can then call ` AddDim() ` on.
 
+#### `tensorflow::TensorShape::~TensorShape()` {#tensorflow_TensorShape_TensorShape}
+
+
+
+
+
+#### `tensorflow::TensorShape::TensorShape(const TensorShape &b)` {#tensorflow_TensorShape_TensorShape}
+
+Copy the specified shape.
+
+
+
+#### `void tensorflow::TensorShape::operator=(const TensorShape &b)` {#void_tensorflow_TensorShape_operator_}
+
+
+
+
+
 #### `void tensorflow::TensorShape::Clear()` {#void_tensorflow_TensorShape_Clear}
 
 Clear a tensor shape.
@@ -125,7 +108,7 @@ Returns the number of elements in dimension `d`. REQUIRES: `0 <= d < dims() `
 
 
 
-#### `gtl::ArraySlice<int64> tensorflow::TensorShape::dim_sizes() const` {#gtl_ArraySlice_int64_tensorflow_TensorShape_dim_sizes}
+#### `gtl::InlinedVector< int64, 4 > tensorflow::TensorShape::dim_sizes() const` {#gtl_InlinedVector_int64_4_tensorflow_TensorShape_dim_sizes}
 
 Returns sizes of all dimensions.
 
@@ -185,9 +168,9 @@ For error messages.
 
 
 
-#### `string tensorflow::TensorShape::ShortDebugString() const` {#string_tensorflow_TensorShape_ShortDebugString}
+#### `void tensorflow::TensorShape::DumpRep() const` {#void_tensorflow_TensorShape_DumpRep}
+
 
-Same as DebugString()
 
 
 
@@ -203,8 +186,8 @@ Returns `true` iff `proto` is a valid tensor shape.
 
 Returns `OK` iff `proto` is a valid tensor shape, and a descriptive error status otherwise.
 
-#### `string tensorflow::TensorShape::ShortDebugString(const TensorShapeProto &proto)` {#string_tensorflow_TensorShape_ShortDebugString}
+#### `string tensorflow::TensorShape::DebugString(const TensorShapeProto &proto)` {#string_tensorflow_TensorShape_DebugString}
 
 
 
-Same as `TensorShape(proto). ShortDebugString() ` but doesn&apos;t crash for invalid protos.
+Same as `TensorShape(proto). DebugString() ` but doesn&apos;t crash for invalid protos.
diff --git a/tensorflow/g3doc/api_docs/cc/ClassTensorShapeUtils.md b/tensorflow/g3doc/api_docs/cc/ClassTensorShapeUtils.md
index a127e962c9..93f1230315 100644
--- a/tensorflow/g3doc/api_docs/cc/ClassTensorShapeUtils.md
+++ b/tensorflow/g3doc/api_docs/cc/ClassTensorShapeUtils.md
@@ -1,22 +1,10 @@
-# Class `tensorflow::TensorShapeUtils`
+# `class tensorflow::TensorShapeUtils`
 
 Static helper routines for ` TensorShape `. Includes a few common predicates on a tensor shape.
 
 
 
-##Member Summary
-
-* [`static bool tensorflow::TensorShapeUtils::IsScalar(const TensorShape &shape)`](#static_bool_tensorflow_TensorShapeUtils_IsScalar)
-* [`static bool tensorflow::TensorShapeUtils::IsVector(const TensorShape &shape)`](#static_bool_tensorflow_TensorShapeUtils_IsVector)
-* [`static bool tensorflow::TensorShapeUtils::IsVectorOrHigher(const TensorShape &shape)`](#static_bool_tensorflow_TensorShapeUtils_IsVectorOrHigher)
-* [`static bool tensorflow::TensorShapeUtils::IsMatrix(const TensorShape &shape)`](#static_bool_tensorflow_TensorShapeUtils_IsMatrix)
-* [`static bool tensorflow::TensorShapeUtils::IsMatrixOrHigher(const TensorShape &shape)`](#static_bool_tensorflow_TensorShapeUtils_IsMatrixOrHigher)
-* [`static Status tensorflow::TensorShapeUtils::MakeShape(const T *dims, int n, TensorShape *out)`](#static_Status_tensorflow_TensorShapeUtils_MakeShape)
-  * Returns a ` TensorShape ` whose dimensions are `dims[0]`, `dims[1]`, ..., `dims[n-1]`.
-* [`static string tensorflow::TensorShapeUtils::ShapeListString(const gtl::ArraySlice< TensorShape > &shapes)`](#static_string_tensorflow_TensorShapeUtils_ShapeListString)
-* [`bool tensorflow::TensorShapeUtils::StartsWith(const TensorShape &shape0, const TensorShape &shape1)`](#bool_tensorflow_TensorShapeUtils_StartsWith)
-
-##Member Details
+###Member Details
 
 #### `static bool tensorflow::TensorShapeUtils::IsScalar(const TensorShape &shape)` {#static_bool_tensorflow_TensorShapeUtils_IsScalar}
 
diff --git a/tensorflow/g3doc/api_docs/cc/ClassThread.md b/tensorflow/g3doc/api_docs/cc/ClassThread.md
index ed9c55eacc..526353ec20 100644
--- a/tensorflow/g3doc/api_docs/cc/ClassThread.md
+++ b/tensorflow/g3doc/api_docs/cc/ClassThread.md
@@ -1,16 +1,10 @@
-# Class `tensorflow::Thread`
+# `class tensorflow::Thread`
 
 
 
 
 
-##Member Summary
-
-* [`tensorflow::Thread::Thread()`](#tensorflow_Thread_Thread)
-* [`tensorflow::Thread::~Thread()`](#tensorflow_Thread_Thread)
-  * Blocks until the thread of control stops running.
-
-##Member Details
+###Member Details
 
 #### `tensorflow::Thread::Thread()` {#tensorflow_Thread_Thread}
 
diff --git a/tensorflow/g3doc/api_docs/cc/ClassWritableFile.md b/tensorflow/g3doc/api_docs/cc/ClassWritableFile.md
index f7784dfdf3..a7e250d697 100644
--- a/tensorflow/g3doc/api_docs/cc/ClassWritableFile.md
+++ b/tensorflow/g3doc/api_docs/cc/ClassWritableFile.md
@@ -1,19 +1,10 @@
-# Class `tensorflow::WritableFile`
+# `class tensorflow::WritableFile`
 
 A file abstraction for sequential writing.
 
 The implementation must provide buffering since callers may append small fragments at a time to the file.
 
-##Member Summary
-
-* [`tensorflow::WritableFile::WritableFile()`](#tensorflow_WritableFile_WritableFile)
-* [`tensorflow::WritableFile::~WritableFile()`](#tensorflow_WritableFile_WritableFile)
-* [`virtual Status tensorflow::WritableFile::Append(const StringPiece &data)=0`](#virtual_Status_tensorflow_WritableFile_Append)
-* [`virtual Status tensorflow::WritableFile::Close()=0`](#virtual_Status_tensorflow_WritableFile_Close)
-* [`virtual Status tensorflow::WritableFile::Flush()=0`](#virtual_Status_tensorflow_WritableFile_Flush)
-* [`virtual Status tensorflow::WritableFile::Sync()=0`](#virtual_Status_tensorflow_WritableFile_Sync)
-
-##Member Details
+###Member Details
 
 #### `tensorflow::WritableFile::WritableFile()` {#tensorflow_WritableFile_WritableFile}
 
diff --git a/tensorflow/g3doc/api_docs/cc/StructSessionOptions.md b/tensorflow/g3doc/api_docs/cc/StructSessionOptions.md
index 10c8204921..f0dbe1a304 100644
--- a/tensorflow/g3doc/api_docs/cc/StructSessionOptions.md
+++ b/tensorflow/g3doc/api_docs/cc/StructSessionOptions.md
@@ -1,20 +1,10 @@
-# Struct `tensorflow::SessionOptions`
+# `struct tensorflow::SessionOptions`
 
 Configuration information for a Session .
 
 
 
-##Member Summary
-
-* [`Env* tensorflow::SessionOptions::env`](#Env_tensorflow_SessionOptions_env)
-  * The environment to use.
-* [`string tensorflow::SessionOptions::target`](#string_tensorflow_SessionOptions_target)
-  * The TensorFlow runtime to connect to.
-* [`ConfigProto tensorflow::SessionOptions::config`](#ConfigProto_tensorflow_SessionOptions_config)
-  * Configuration options.
-* [`tensorflow::SessionOptions::SessionOptions()`](#tensorflow_SessionOptions_SessionOptions)
-
-##Member Details
+###Member Details
 
 #### `Env* tensorflow::SessionOptions::env` {#Env_tensorflow_SessionOptions_env}
 
diff --git a/tensorflow/g3doc/api_docs/cc/StructState.md b/tensorflow/g3doc/api_docs/cc/StructState.md
index 903d03fa97..a0335b20e0 100644
--- a/tensorflow/g3doc/api_docs/cc/StructState.md
+++ b/tensorflow/g3doc/api_docs/cc/StructState.md
@@ -1,15 +1,10 @@
-# Struct `tensorflow::Status::State`
+# `struct tensorflow::Status::State`
 
 
 
 
 
-##Member Summary
-
-* [`tensorflow::error::Code tensorflow::Status::State::code`](#tensorflow_error_Code_tensorflow_Status_State_code)
-* [`string tensorflow::Status::State::msg`](#string_tensorflow_Status_State_msg)
-
-##Member Details
+###Member Details
 
 #### `tensorflow::error::Code tensorflow::Status::State::code` {#tensorflow_error_Code_tensorflow_Status_State_code}
 
diff --git a/tensorflow/g3doc/api_docs/cc/StructTF_Buffer.md b/tensorflow/g3doc/api_docs/cc/StructTF_Buffer.md
index e0e46a8794..3f6ffa349c 100644
--- a/tensorflow/g3doc/api_docs/cc/StructTF_Buffer.md
+++ b/tensorflow/g3doc/api_docs/cc/StructTF_Buffer.md
@@ -1,15 +1,10 @@
-# Struct `TF_Buffer`
+# `struct TF_Buffer`
 
 
 
 
 
-##Member Summary
-
-* [`const void* TF_Buffer::data`](#const_void_TF_Buffer_data)
-* [`size_t TF_Buffer::length`](#size_t_TF_Buffer_length)
-
-##Member Details
+###Member Details
 
 #### `const void* TF_Buffer::data` {#const_void_TF_Buffer_data}
 
diff --git a/tensorflow/g3doc/api_docs/cc/StructTensorShapeDim.md b/tensorflow/g3doc/api_docs/cc/StructTensorShapeDim.md
index b325bc6937..f2471b1988 100644
--- a/tensorflow/g3doc/api_docs/cc/StructTensorShapeDim.md
+++ b/tensorflow/g3doc/api_docs/cc/StructTensorShapeDim.md
@@ -1,17 +1,12 @@
-# Struct `tensorflow::TensorShapeDim`
+# `struct tensorflow::TensorShapeDim`
 
 
 
 
 
-##Member Summary
+###Member Details
 
-* [`int tensorflow::TensorShapeDim::size`](#int_tensorflow_TensorShapeDim_size)
-* [`tensorflow::TensorShapeDim::TensorShapeDim(int64 s)`](#tensorflow_TensorShapeDim_TensorShapeDim)
-
-##Member Details
-
-#### `int tensorflow::TensorShapeDim::size` {#int_tensorflow_TensorShapeDim_size}
+#### `int64 tensorflow::TensorShapeDim::size` {#int64_tensorflow_TensorShapeDim_size}
 
 
 
diff --git a/tensorflow/g3doc/api_docs/cc/StructThreadOptions.md b/tensorflow/g3doc/api_docs/cc/StructThreadOptions.md
index 56b501acf8..35db265ecd 100644
--- a/tensorflow/g3doc/api_docs/cc/StructThreadOptions.md
+++ b/tensorflow/g3doc/api_docs/cc/StructThreadOptions.md
@@ -1,17 +1,10 @@
-# Struct `tensorflow::ThreadOptions`
+# `struct tensorflow::ThreadOptions`
 
 Options to configure a Thread .
 
 Note that the options are all hints, and the underlying implementation may choose to ignore it.
 
-##Member Summary
-
-* [`size_t tensorflow::ThreadOptions::stack_size`](#size_t_tensorflow_ThreadOptions_stack_size)
-  * Thread stack size to use (in bytes).
-* [`size_t tensorflow::ThreadOptions::guard_size`](#size_t_tensorflow_ThreadOptions_guard_size)
-  * Guard area size to use near thread stacks to use (in bytes)
-
-##Member Details
+###Member Details
 
 #### `size_t tensorflow::ThreadOptions::stack_size` {#size_t_tensorflow_ThreadOptions_stack_size}
 
diff --git a/tensorflow/g3doc/api_docs/cc/index.md b/tensorflow/g3doc/api_docs/cc/index.md
index 56b842b57d..c84025ce59 100644
--- a/tensorflow/g3doc/api_docs/cc/index.md
+++ b/tensorflow/g3doc/api_docs/cc/index.md
@@ -25,56 +25,33 @@ write the graph to a file.
 
 ## Env
 
-* [tensorflow::Env](ClassEnv.md)
-* [tensorflow::RandomAccessFile](ClassRandomAccessFile.md)
-* [tensorflow::WritableFile](ClassWritableFile.md)
-* [tensorflow::EnvWrapper](ClassEnvWrapper.md)
+* [tensorflow::Env](classEnv.md)
+* [tensorflow::RandomAccessFile](classRandomAccessFile.md)
+* [tensorflow::WritableFile](classWritableFile.md)
+* [tensorflow::EnvWrapper](classEnvWrapper.md)
 
 ## Session
 
-* [tensorflow::Session](ClassSession.md)
-* [tensorflow::SessionOptions](StructSessionOptions.md)
+* [tensorflow::Session](classSession.md)
+* [tensorflow::SessionOptions](structSessionOptions.md)
 
 ## Status
 
-* [tensorflow::Status](ClassStatus.md)
-* [tensorflow::Status::State](StructState.md)
+* [tensorflow::Status](classStatus.md)
+* [tensorflow::Status::State](structState.md)
 
 ## Tensor
 
-* [tensorflow::Tensor](ClassTensor.md)
-* [tensorflow::TensorShape](ClassTensorShape.md)
-* [tensorflow::TensorShapeDim](StructTensorShapeDim.md)
-* [tensorflow::TensorShapeUtils](ClassTensorShapeUtils.md)
-* [tensorflow::PartialTensorShape](ClassPartialTensorShape.md)
-* [tensorflow::PartialTensorShapeUtils](ClassPartialTensorShapeUtils.md)
-* [TF_Buffer](StructTF_Buffer.md)
+* [tensorflow::Tensor](classTensor.md)
+* [tensorflow::TensorShape](classTensorShape.md)
+* [tensorflow::TensorShapeDim](structTensorShapeDim.md)
+* [tensorflow::TensorShapeUtils](classTensorShapeUtils.md)
+* [tensorflow::PartialTensorShape](classPartialTensorShape.md)
+* [tensorflow::PartialTensorShapeUtils](classPartialTensorShapeUtils.md)
+* [TF_Buffer](structTF_Buffer.md)
 
 ## Thread
 
-* [tensorflow::Thread](ClassThread.md)
-* [tensorflow::ThreadOptions](StructThreadOptions.md)
+* [tensorflow::Thread](classThread.md)
+* [tensorflow::ThreadOptions](structThreadOptions.md)
 
-
-
-<div class='sections-order' style="display: none;">
-<!--
-<!-- ClassEnv.md -->
-<!-- ClassRandomAccessFile.md -->
-<!-- ClassWritableFile.md -->
-<!-- ClassEnvWrapper.md -->
-<!-- ClassSession.md -->
-<!-- StructSessionOptions.md -->
-<!-- ClassStatus.md -->
-<!-- StructState.md -->
-<!-- ClassTensor.md -->
-<!-- ClassTensorShape.md -->
-<!-- StructTensorShapeDim.md -->
-<!-- ClassTensorShapeUtils.md -->
-<!-- ClassPartialTensorShape.md -->
-<!-- ClassPartialTensorShapeUtils.md -->
-<!-- StructTF_Buffer.md -->
-<!-- ClassThread.md -->
-<!-- StructThreadOptions.md -->
--->
-</div>
diff --git a/tensorflow/g3doc/api_docs/leftnav_files b/tensorflow/g3doc/api_docs/leftnav_files
index ceeb88af8f..1f2ac0da96 100644
--- a/tensorflow/g3doc/api_docs/leftnav_files
+++ b/tensorflow/g3doc/api_docs/leftnav_files
@@ -13,7 +13,11 @@ python/python_io.md
 python/nn.md
 python/client.md
 python/train.md
-### [C++ API](/api_docs/cc/index.md)
+python/script_ops.md
+python/test.md
+python/contrib.layers.md
+python/contrib.util.md
+>>> [C++ API](/api_docs/cc/index.md)
 cc/ClassEnv.md
 cc/ClassRandomAccessFile.md
 cc/ClassWritableFile.md
@@ -26,5 +30,7 @@ cc/ClassTensor.md
 cc/ClassTensorShape.md
 cc/StructTensorShapeDim.md
 cc/ClassTensorShapeUtils.md
+cc/ClassPartialTensorShape.md
+cc/ClassPartialTensorShapeUtils.md
 cc/ClassThread.md
-cc/StructThreadOptions.md
-\ No newline at end of file
+cc/StructThreadOptions.md
diff --git a/tensorflow/g3doc/api_docs/python/array_ops.md b/tensorflow/g3doc/api_docs/python/array_ops.md
index 6be584d932..bd01b7afca 100644
--- a/tensorflow/g3doc/api_docs/python/array_ops.md
+++ b/tensorflow/g3doc/api_docs/python/array_ops.md
@@ -3,7 +3,7 @@
 # Tensor Transformations
 
 Note: Functions taking `Tensor` arguments can also take anything accepted by
-[`tf.convert_to_tensor`](../../api_docs/python/framework.md#convert_to_tensor).
+[`tf.convert_to_tensor`](framework.md#convert_to_tensor).
 
 [TOC]
 
diff --git a/tensorflow/g3doc/api_docs/python/constant_op.md b/tensorflow/g3doc/api_docs/python/constant_op.md
index 4a02f041c9..4cffb39fd8 100644
--- a/tensorflow/g3doc/api_docs/python/constant_op.md
+++ b/tensorflow/g3doc/api_docs/python/constant_op.md
@@ -3,7 +3,7 @@
 # Constants, Sequences, and Random Values
 
 Note: Functions taking `Tensor` arguments can also take anything accepted by
-[`tf.convert_to_tensor`](../../api_docs/python/framework.md#convert_to_tensor).
+[`tf.convert_to_tensor`](framework.md#convert_to_tensor).
 
 [TOC]
 
diff --git a/tensorflow/g3doc/api_docs/python/contrib.layers.md b/tensorflow/g3doc/api_docs/python/contrib.layers.md
new file mode 100644
index 0000000000..d9351e47ed
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/python/contrib.layers.md
@@ -0,0 +1,379 @@
+<!-- This file is machine generated: DO NOT EDIT! -->
+
+# Layers (contrib)
+[TOC]
+
+Ops for building neural network layers, regularizers, summaries, etc.
+
+## Higher level ops for building neural network layers.
+
+This package provides several ops that take care of creating variables that are
+used internally in a consistent way and provide the building blocks for many
+common machine learning algorithms.
+
+- - -
+
+### `tf.contrib.layers.convolution2d(x, num_output_channels, kernel_size, activation_fn=None, stride=(1, 1), padding='SAME', weight_init=_initializer, bias_init=_initializer, name=None, weight_collections=None, bias_collections=None, output_collections=None, weight_regularizer=None, bias_regularizer=None)` {#convolution2d}
+
+Adds the parameters for a conv2d layer and returns the output.
+
+A neural network convolution layer is generally defined as:
+\\(y = f(conv2d(w, x) + b)\\) where **f** is given by `activation_fn`,
+**conv2d** is `tf.nn.conv2d` and `x` has shape
+`[batch, height, width, channels]`. The output of this op is of shape
+`[batch, out_height, out_width, num_output_channels]`, where `out_width` and
+`out_height` are determined by the `padding` argument. See `conv2D` for
+details.
+
+This op creates `w` and optionally `b` and adds various summaries that can be
+useful for visualizing learning or diagnosing training problems. Bias can be
+disabled by setting `bias_init` to `None`.
+
+The variable creation is compatible with `tf.variable_scope` and so can be
+reused with `tf.variable_scope` or `tf.make_template`.
+
+Most of the details of variable creation can be controlled by specifying the
+initializers (`weight_init` and `bias_init`) and which collections to place
+the created variables in (`weight_collections` and `bias_collections`).
+
+A per layer regularization can be specified by setting `weight_regularizer`.
+This is only applied to weights and not the bias.
+
+##### Args:
+
+
+*  <b>`x`</b>: A 4-D input `Tensor`.
+*  <b>`num_output_channels`</b>: The number of output channels (i.e. the size of the
+    last dimension of the output).
+*  <b>`kernel_size`</b>: A length 2 `list` or `tuple` containing the kernel size.
+*  <b>`activation_fn`</b>: A function that requires a single Tensor that is applied as a
+    non-linearity.
+*  <b>`stride`</b>: A length 2 `list` or `tuple` specifying the stride of the sliding
+    window across the image.
+*  <b>`padding`</b>: A `string` from: "SAME", "VALID". The type of padding algorithm to
+    use.
+*  <b>`weight_init`</b>: An optional initialization. If not specified, uses Xavier
+    initialization (see `tf.learn.xavier_initializer`).
+*  <b>`bias_init`</b>: An initializer for the bias, defaults to 0. Set to`None` in order
+    to disable bias.
+*  <b>`name`</b>: The name for this operation is used to name operations and to find
+    variables. If specified it must be unique for this scope, otherwise a
+    unique name starting with "convolution2d" will be created.  See
+    `tf.variable_op_scope` for details.
+*  <b>`weight_collections`</b>: List of graph collections to which weights are added.
+*  <b>`bias_collections`</b>: List of graph collections to which biases are added.
+*  <b>`output_collections`</b>: List of graph collections to which outputs are added.
+*  <b>`weight_regularizer`</b>: A regularizer like the result of
+    `l1_regularizer` or `l2_regularizer`. Used for weights.
+*  <b>`bias_regularizer`</b>: A regularizer like the result of
+    `l1_regularizer` or `l2_regularizer`. Used for biases.
+
+##### Returns:
+
+  The result of applying a 2-D convolutional layer.
+
+##### Raises:
+
+
+*  <b>`ValueError`</b>: If `kernel_size` or `stride` are not length 2.
+
+
+- - -
+
+### `tf.contrib.layers.fully_connected(x, num_output_units, activation_fn=None, weight_init=_initializer, bias_init=_initializer, name=None, weight_collections=('weights',), bias_collections=('biases',), output_collections=('activations',), weight_regularizer=None, bias_regularizer=None)` {#fully_connected}
+
+Adds the parameters for a fully connected layer and returns the output.
+
+A fully connected layer is generally defined as a matrix multiply:
+`y = f(w * x + b)` where `f` is given by `activation_fn`. If
+`activation_fn` is `None`, the result of `y = w * x + b` is
+returned.
+
+This op creates `w` and optionally `b`. Bias (`b`) can be disabled by setting
+`bias_init` to `None`.
+
+The variable creation is compatible with `tf.variable_scope` and so can be
+reused with `tf.variable_scope` or `tf.make_template`.
+
+Most of the details of variable creation can be controlled by specifying the
+initializers (`weight_init` and `bias_init`) and which in collections to place
+the created variables (`weight_collections` and `bias_collections`; note that
+the variables are always added to the `VARIABLES` collection). The output of
+the layer can be placed in custom collections using `output_collections`.
+The collections arguments default to `WEIGHTS`, `BIASES` and `ACTIVATIONS`,
+respectively.
+
+A per layer regularization can be specified by setting `weight_regularizer`
+and `bias_regularizer`, which are applied to the weights and biases
+respectively, and whose output is added to the `REGULARIZATION_LOSSES`
+collection.
+
+##### Args:
+
+
+*  <b>`x`</b>: The input `Tensor`.
+*  <b>`num_output_units`</b>: The size of the output.
+*  <b>`activation_fn`</b>: A function that requires a single Tensor that is applied as a
+    non-linearity. If None is used, do not apply any activation.
+*  <b>`weight_init`</b>: An optional weight initialization, defaults to
+    `xavier_initializer`.
+*  <b>`bias_init`</b>: An initializer for the bias, defaults to 0. Set to `None` in
+    order to disable bias.
+*  <b>`name`</b>: The name for this operation is used to name operations and to find
+    variables. If specified it must be unique for this scope, otherwise a
+    unique name starting with "fully_connected" will be created.  See
+    `tf.variable_op_scope` for details.
+*  <b>`weight_collections`</b>: List of graph collections to which weights are added.
+*  <b>`bias_collections`</b>: List of graph collections to which biases are added.
+*  <b>`output_collections`</b>: List of graph collections to which outputs are added.
+*  <b>`weight_regularizer`</b>: A regularizer like the result of
+    `l1_regularizer` or `l2_regularizer`. Used for weights.
+*  <b>`bias_regularizer`</b>: A regularizer like the result of
+    `l1_regularizer` or `l2_regularizer`. Used for biases.
+
+##### Returns:
+
+  The output of the fully connected layer.
+
+
+
+Aliases for fully_connected which set a default activation function are
+available: `relu`, `relu6` and `linear`.
+
+## Regularizers
+
+Regularization can help prevent overfitting. These have the signature
+`fn(weights)`. The loss is typically added to `tf.GraphKeys.REGULARIZATION_LOSS`
+
+- - -
+
+### `tf.contrib.layers.l1_regularizer(scale)` {#l1_regularizer}
+
+Returns a function that can be used to apply L1 regularization to weights.
+
+L1 regularization encourages sparsity.
+
+##### Args:
+
+
+*  <b>`scale`</b>: A scalar multiplier `Tensor`. 0.0 disables the regularizer.
+
+##### Returns:
+
+  A function with signature `l1(weights, name=None)` that apply L1
+  regularization.
+
+##### Raises:
+
+
+*  <b>`ValueError`</b>: If scale is outside of the range [0.0, 1.0] or if scale is not a
+  float.
+
+
+- - -
+
+### `tf.contrib.layers.l2_regularizer(scale)` {#l2_regularizer}
+
+Returns a function that can be used to apply L2 regularization to weights.
+
+Small values of L2 can help prevent overfitting the training data.
+
+##### Args:
+
+
+*  <b>`scale`</b>: A scalar multiplier `Tensor`. 0.0 disables the regularizer.
+
+##### Returns:
+
+  A function with signature `l2(weights, name=None)` that applies L2
+  regularization.
+
+##### Raises:
+
+
+*  <b>`ValueError`</b>: If scale is outside of the range [0.0, 1.0] or if scale is not a
+  float.
+
+
+
+## Initializers
+
+Initializers are used to initialize variables with sensible values given their
+size, data type, and purpose.
+
+- - -
+
+### `tf.contrib.layers.xavier_initializer(uniform=True, seed=None, dtype=tf.float32)` {#xavier_initializer}
+
+Returns an initializer performing "Xavier" initialization for weights.
+
+This function implements the weight initialization from:
+
+Xavier Glorot and Yoshua Bengio (2010):
+         Understanding the difficulty of training deep feedforward neural
+         networks. International conference on artificial intelligence and
+         statistics.
+
+This initializer is designed to keep the scale of the gradients roughly the
+same in all layers. In uniform distribution this ends up being the range:
+`x = sqrt(6. / (in + out)); [-x, x]` and for normal distribution a standard
+deviation of `sqrt(3. / (in + out))` is used.
+
+The returned initializer assumes that the shape of the weight matrix to be
+initialized is `[in, out]`.
+
+##### Args:
+
+
+*  <b>`uniform`</b>: Whether to use uniform or normal distributed random initialization.
+*  <b>`seed`</b>: A Python integer. Used to create random seeds. See
+    [`set_random_seed`](../../api_docs/python/constant_op.md#set_random_seed)
+    for behavior.
+*  <b>`dtype`</b>: The data type. Only floating point types are supported.
+
+##### Returns:
+
+  An initializer for a 2-D weight matrix.
+
+##### Raises:
+
+
+*  <b>`TypeError`</b>: If dtype is not a floating point type.
+
+
+- - -
+
+### `tf.contrib.layers.xavier_initializer_conv2d(uniform=True, seed=None, dtype=tf.float32)` {#xavier_initializer_conv2d}
+
+Returns an "Xavier" initializer for 2D convolution weights.
+
+For details on the initialization performed, see `xavier_initializer`. This
+function initializes a convolution weight variable which is assumed to be 4-D.
+The first two dimensions are expected to be the kernel size, the third
+dimension is the number of input channels, and the last dimension is the
+number of output channels.
+
+The number of inputs is therefore `shape[0]*shape[1]*shape[2]`, and the number
+of outputs is `shape[0]*shape[1]*shape[3]`.
+
+##### Args:
+
+
+*  <b>`uniform`</b>: Whether to use uniform or normal distributed random initialization.
+*  <b>`seed`</b>: A Python integer. Used to create random seeds. See
+    [`set_random_seed`](../../api_docs/python/constant_op.md#set_random_seed)
+    for behavior.
+*  <b>`dtype`</b>: The data type. Only floating point types are supported.
+
+##### Returns:
+
+  An initializer for a 4-D weight matrix.
+
+##### Raises:
+
+
+*  <b>`TypeError`</b>: If dtype is not a floating point type.
+
+
+
+## Summaries
+
+Helper functions to summarize specific variables or ops.
+
+- - -
+
+### `tf.contrib.layers.summarize_activation(op)` {#summarize_activation}
+
+Summarize an activation.
+
+This applies the given activation and adds useful summaries specific to the
+activation.
+
+##### Args:
+
+
+*  <b>`op`</b>: The tensor to summarize (assumed to be a layer activation).
+
+##### Returns:
+
+  The summary op created to summarize `op`.
+
+
+- - -
+
+### `tf.contrib.layers.summarize_tensor(tensor)` {#summarize_tensor}
+
+Summarize a tensor using a suitable summary type.
+
+This function adds a summary op for `tensor`. The type of summary depends on
+the shape of `tensor`. For scalars, a `scalar_summary` is created, for all
+other tensors, `histogram_summary` is used.
+
+##### Args:
+
+
+*  <b>`tensor`</b>: The tensor to summarize
+
+##### Returns:
+
+  The summary op created.
+
+
+- - -
+
+### `tf.contrib.layers.summarize_tensors(tensors, summarizer=summarize_tensor)` {#summarize_tensors}
+
+Summarize a set of tensors.
+
+
+- - -
+
+### `tf.contrib.layers.summarize_collection(collection, name_filter=None, summarizer=summarize_tensor)` {#summarize_collection}
+
+Summarize a graph collection of tensors, possibly filtered by name.
+
+
+
+The layers module defines convenience functions `summarize_variables`,
+`summarize_weights` and `summarize_biases`, which set the `collection` argument
+of `summarize_collection` to `VARIABLES`, `WEIGHTS` and `BIASES`, respectively.
+
+- - -
+
+### `tf.contrib.layers.summarize_activations(name_filter=None, summarizer=summarize_activation)` {#summarize_activations}
+
+Summarize activations, using `summarize_activation` to summarize.
+
+
+
+## Other Functions and Classes
+- - -
+
+### `tf.contrib.layers.assert_same_float_dtype(tensors=None, dtype=None)` {#assert_same_float_dtype}
+
+Validate and return float type based on `tensors` and `dtype`.
+
+For ops such as matrix multiplication, inputs and weights must be of the
+same float type. This function validates that all `tensors` are the same type,
+validates that type is `dtype` (if supplied), and returns the type. Type must
+be `dtypes.float32` or `dtypes.float64`. If neither `tensors` nor
+`dtype` is supplied, default to `dtypes.float32`.
+
+##### Args:
+
+
+*  <b>`tensors`</b>: Tensors of input values. Can include `None` elements, which will be
+      ignored.
+*  <b>`dtype`</b>: Expected type.
+
+##### Returns:
+
+  Validated type.
+
+##### Raises:
+
+
+*  <b>`ValueError`</b>: if neither `tensors` nor `dtype` is supplied, or result is not
+      float.
+
+
diff --git a/tensorflow/g3doc/api_docs/python/contrib.util.md b/tensorflow/g3doc/api_docs/python/contrib.util.md
new file mode 100644
index 0000000000..26b71172cb
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/python/contrib.util.md
@@ -0,0 +1,85 @@
+<!-- This file is machine generated: DO NOT EDIT! -->
+
+# Utilities (contrib)
+[TOC]
+
+Utilities for dealing with Tensors.
+
+## Miscellaneous Utility Functions
+
+- - -
+
+### `tf.contrib.util.constant_value(tensor)` {#constant_value}
+
+Returns the constant value of the given tensor, if efficiently calculable.
+
+This function attempts to partially evaluate the given tensor, and
+returns its value as a numpy ndarray if this succeeds.
+
+TODO(mrry): Consider whether this function should use a registration
+mechanism like gradients and ShapeFunctions, so that it is easily
+extensible.
+
+##### Args:
+
+
+*  <b>`tensor`</b>: The Tensor to be evaluated.
+
+##### Returns:
+
+  A numpy ndarray containing the constant value of the given `tensor`,
+  or None if it cannot be calculated.
+
+##### Raises:
+
+
+*  <b>`TypeError`</b>: if tensor is not an ops.Tensor.
+
+
+- - -
+
+### `tf.contrib.util.make_tensor_proto(values, dtype=None, shape=None)` {#make_tensor_proto}
+
+Create a TensorProto.
+
+##### Args:
+
+
+*  <b>`values`</b>: Values to put in the TensorProto.
+*  <b>`dtype`</b>: Optional tensor_pb2 DataType value.
+*  <b>`shape`</b>: List of integers representing the dimensions of tensor.
+
+##### Returns:
+
+  A TensorProto. Depending on the type, it may contain data in the
+  "tensor_content" attribute, which is not directly useful to Python programs.
+  To access the values you should convert the proto back to a numpy ndarray
+  with tensor_util.MakeNdarray(proto).
+
+##### Raises:
+
+
+*  <b>`TypeError`</b>: if unsupported types are provided.
+*  <b>`ValueError`</b>: if arguments have inappropriate values.
+
+make_tensor_proto accepts "values" of a python scalar, a python list, a
+numpy ndarray, or a numpy scalar.
+
+If "values" is a python scalar or a python list, make_tensor_proto
+first convert it to numpy ndarray. If dtype is None, the
+conversion tries its best to infer the right numpy data
+type. Otherwise, the resulting numpy array has a compatible data
+type with the given dtype.
+
+In either case above, the numpy ndarray (either the caller provided
+or the auto converted) must have the compatible type with dtype.
+
+make_tensor_proto then converts the numpy array to a tensor proto.
+
+If "shape" is None, the resulting tensor proto represents the numpy
+array precisely.
+
+Otherwise, "shape" specifies the tensor's shape and the numpy array
+can not have more elements than what "shape" specifies.
+
+
diff --git a/tensorflow/g3doc/api_docs/python/control_flow_ops.md b/tensorflow/g3doc/api_docs/python/control_flow_ops.md
index 739963d3d0..c4b4d6e4f4 100644
--- a/tensorflow/g3doc/api_docs/python/control_flow_ops.md
+++ b/tensorflow/g3doc/api_docs/python/control_flow_ops.md
@@ -3,7 +3,7 @@
 # Control Flow
 
 Note: Functions taking `Tensor` arguments can also take anything accepted by
-[`tf.convert_to_tensor`](../../api_docs/python/framework.md#convert_to_tensor).
+[`tf.convert_to_tensor`](framework.md#convert_to_tensor).
 
 [TOC]
 
diff --git a/tensorflow/g3doc/api_docs/python/image.md b/tensorflow/g3doc/api_docs/python/image.md
index 6a5b6fb821..094488a0b9 100644
--- a/tensorflow/g3doc/api_docs/python/image.md
+++ b/tensorflow/g3doc/api_docs/python/image.md
@@ -3,7 +3,7 @@
 # Images
 
 Note: Functions taking `Tensor` arguments can also take anything accepted by
-[`tf.convert_to_tensor`](../../api_docs/python/framework.md#convert_to_tensor).
+[`tf.convert_to_tensor`](framework.md#convert_to_tensor).
 
 [TOC]
 
@@ -1094,6 +1094,132 @@ Note that this implementation is limited:
 
 
 
+## Working with Bounding Boxes
+
+- - -
+
+### `tf.image.draw_bounding_boxes(images, boxes, name=None)` {#draw_bounding_boxes}
+
+Draw bounding boxes on a batch of images.
+
+Outputs a copy of `images` but draws on top of the pixels zero or more bounding
+boxes specified by the locations in `boxes`. The coordinates of the each
+bounding box in `boxes are encoded as `[y_min, x_min, y_max, x_max]`. The
+bounding box coordinates are floats in `[0.0, 1.0]` relative to the width and
+height of the underlying image.
+
+For example, if an image is 100 x 200 pixels and the bounding box is
+`[0.1, 0.5, 0.2, 0.9]`, the bottom-left and upper-right coordinates of the
+bounding box will be `(10, 40)` to `(50, 180)`.
+
+Parts of the bounding box may fall outside the image.
+
+##### Args:
+
+
+*  <b>`images`</b>: A `Tensor` of type `float32`.
+    4-D with shape `[batch, height, width, depth]`. A batch of images.
+*  <b>`boxes`</b>: A `Tensor` of type `float32`.
+    3-D with shape `[batch, num_bounding_boxes, 4]` containing bounding
+    boxes.
+*  <b>`name`</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `float32`.
+  4-D with the same shape as `images`. The batch of input images with
+  bounding boxes drawn on the images.
+
+
+- - -
+
+### `tf.image.sample_distorted_bounding_box(image_size, bounding_boxes, seed=None, seed2=None, min_object_covered=None, aspect_ratio_range=None, area_range=None, max_attempts=None, use_image_if_no_bounding_boxes=None, name=None)` {#sample_distorted_bounding_box}
+
+Generate a single randomly distorted bounding box for an image.
+
+Bounding box annotations are often supplied in addition to ground-truth labels
+in image recognition or object localization tasks. A common technique for
+training such a system is to randomly distort an image while preserving
+its content, i.e. *data augmentation*. This Op outputs a randomly distorted
+localization of an object, i.e. bounding box, given an `image_size`,
+`bounding_boxes` and a series of constraints.
+
+The output of this Op is a single bounding box that may be used to crop the
+original image. The output is returned as 3 tensors: `begin`, `size` and
+`bboxes`. The first 2 tensors can be fed directly into `tf.slice` to crop the
+image. The latter may be supplied to `tf.image.draw_bounding_box` to visualize
+what the bounding box looks like.
+
+Bounding boxes are supplied and returned as `[y_min, x_min, y_max, x_max]`. The
+bounding box coordinates are floats in `[0.0, 1.0]` relative to the width and
+height of the underlying image.
+
+For example,
+
+    # Generate a single distorted bounding box.
+    begin, size, bbox_for_draw = tf.image.sample_distorted_bounding_box(
+        tf.shape(image),
+        bounding_boxes=bounding_boxes)
+
+    # Draw the bounding box in an image summary.
+    image_with_box = tf.image.draw_bounding_boxes(tf.expand_dims(image, 0),
+                                                  bbox_for_draw)
+    tf.image_summary('images_with_box', image_with_box)
+
+    # Employ the bounding box to distort the image.
+    distorted_image = tf.slice(image, begin, size)
+
+Note that if no bounding box information is available, setting
+`use_image_if_no_bounding_boxes = true` will assume there is a single implicit
+bounding box covering the whole image. If `use_image_if_no_bounding_boxes` is
+false and no bounding boxes are supplied, an error is raised.
+
+##### Args:
+
+
+*  <b>`image_size`</b>: A `Tensor`. Must be one of the following types: `uint8`, `int8`, `int16`, `int32`, `int64`.
+    1-D, containing `[height, width, channels]`.
+*  <b>`bounding_boxes`</b>: A `Tensor` of type `float32`.
+    3-D with shape `[batch, N, 4]` describing the N bounding boxes
+    associated with the image.
+*  <b>`seed`</b>: An optional `int`. Defaults to `0`.
+    If either `seed` or `seed2` are set to non-zero, the random number
+    generator is seeded by the given `seed`.  Otherwise, it is seeded by a random
+    seed.
+*  <b>`seed2`</b>: An optional `int`. Defaults to `0`.
+    A second seed to avoid seed collision.
+*  <b>`min_object_covered`</b>: An optional `float`. Defaults to `0.1`.
+    The cropped area of the image must contain at least this
+    fraction of any bounding box supplied.
+*  <b>`aspect_ratio_range`</b>: An optional list of `floats`. Defaults to `[0.75, 1.33]`.
+    The cropped area of the image must have an aspect ratio =
+    width / height within this range.
+*  <b>`area_range`</b>: An optional list of `floats`. Defaults to `[0.05, 1]`.
+    The cropped area of the image must contain a fraction of the
+    supplied image within in this range.
+*  <b>`max_attempts`</b>: An optional `int`. Defaults to `100`.
+    Number of attempts at generating a cropped region of the image
+    of the specified constraints. After `max_attempts` failures, return the entire
+    image.
+*  <b>`use_image_if_no_bounding_boxes`</b>: An optional `bool`. Defaults to `False`.
+    Controls behavior if no bounding boxes supplied.
+    If true, assume an implicit bounding box covering the whole input. If false,
+    raise an error.
+*  <b>`name`</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A tuple of `Tensor` objects (begin, size, bboxes).
+
+*  <b>`begin`</b>: A `Tensor`. Has the same type as `image_size`. 1-D, containing `[offset_height, offset_width, 0]`. Provide as input to
+    `tf.slice`.
+*  <b>`size`</b>: A `Tensor`. Has the same type as `image_size`. 1-D, containing `[target_height, target_width, -1]`. Provide as input to
+    `tf.slice`.
+*  <b>`bboxes`</b>: A `Tensor` of type `float32`. 3-D with shape `[1, 1, 4]` containing the distorted bounding box.
+    Provide as input to `tf.image.draw_bounding_boxes`.
+
+
+
 ## Other Functions and Classes
 - - -
 
diff --git a/tensorflow/g3doc/api_docs/python/index.md b/tensorflow/g3doc/api_docs/python/index.md
index 6a4f1662df..34c7c883b5 100644
--- a/tensorflow/g3doc/api_docs/python/index.md
+++ b/tensorflow/g3doc/api_docs/python/index.md
@@ -232,6 +232,7 @@
   * [`crop_to_bounding_box`](../../api_docs/python/image.md#crop_to_bounding_box)
   * [`decode_jpeg`](../../api_docs/python/image.md#decode_jpeg)
   * [`decode_png`](../../api_docs/python/image.md#decode_png)
+  * [`draw_bounding_boxes`](../../api_docs/python/image.md#draw_bounding_boxes)
   * [`encode_jpeg`](../../api_docs/python/image.md#encode_jpeg)
   * [`encode_png`](../../api_docs/python/image.md#encode_png)
   * [`extract_glimpse`](../../api_docs/python/image.md#extract_glimpse)
@@ -255,6 +256,7 @@
   * [`resize_nearest_neighbor`](../../api_docs/python/image.md#resize_nearest_neighbor)
   * [`rgb_to_grayscale`](../../api_docs/python/image.md#rgb_to_grayscale)
   * [`rgb_to_hsv`](../../api_docs/python/image.md#rgb_to_hsv)
+  * [`sample_distorted_bounding_box`](../../api_docs/python/image.md#sample_distorted_bounding_box)
   * [`saturate_cast`](../../api_docs/python/image.md#saturate_cast)
   * [`transpose_image`](../../api_docs/python/image.md#transpose_image)
 
@@ -413,3 +415,21 @@
   * [`is_built_with_cuda`](../../api_docs/python/test.md#is_built_with_cuda)
   * [`main`](../../api_docs/python/test.md#main)
 
+* **[Layers (contrib)](../../api_docs/python/contrib.layers.md)**:
+  * [`assert_same_float_dtype`](../../api_docs/python/contrib.layers.md#assert_same_float_dtype)
+  * [`convolution2d`](../../api_docs/python/contrib.layers.md#convolution2d)
+  * [`fully_connected`](../../api_docs/python/contrib.layers.md#fully_connected)
+  * [`l1_regularizer`](../../api_docs/python/contrib.layers.md#l1_regularizer)
+  * [`l2_regularizer`](../../api_docs/python/contrib.layers.md#l2_regularizer)
+  * [`summarize_activation`](../../api_docs/python/contrib.layers.md#summarize_activation)
+  * [`summarize_activations`](../../api_docs/python/contrib.layers.md#summarize_activations)
+  * [`summarize_collection`](../../api_docs/python/contrib.layers.md#summarize_collection)
+  * [`summarize_tensor`](../../api_docs/python/contrib.layers.md#summarize_tensor)
+  * [`summarize_tensors`](../../api_docs/python/contrib.layers.md#summarize_tensors)
+  * [`xavier_initializer`](../../api_docs/python/contrib.layers.md#xavier_initializer)
+  * [`xavier_initializer_conv2d`](../../api_docs/python/contrib.layers.md#xavier_initializer_conv2d)
+
+* **[Utilities (contrib)](../../api_docs/python/contrib.util.md)**:
+  * [`constant_value`](../../api_docs/python/contrib.util.md#constant_value)
+  * [`make_tensor_proto`](../../api_docs/python/contrib.util.md#make_tensor_proto)
+
diff --git a/tensorflow/g3doc/api_docs/python/io_ops.md b/tensorflow/g3doc/api_docs/python/io_ops.md
index 185ae07e32..3ef44c65a2 100644
--- a/tensorflow/g3doc/api_docs/python/io_ops.md
+++ b/tensorflow/g3doc/api_docs/python/io_ops.md
@@ -3,7 +3,7 @@
 # Inputs and Readers
 
 Note: Functions taking `Tensor` arguments can also take anything accepted by
-[`tf.convert_to_tensor`](../../api_docs/python/framework.md#convert_to_tensor).
+[`tf.convert_to_tensor`](framework.md#convert_to_tensor).
 
 [TOC]
 
diff --git a/tensorflow/g3doc/api_docs/python/math_ops.md b/tensorflow/g3doc/api_docs/python/math_ops.md
index 7ba87ed026..6e4b653cf9 100644
--- a/tensorflow/g3doc/api_docs/python/math_ops.md
+++ b/tensorflow/g3doc/api_docs/python/math_ops.md
@@ -3,7 +3,7 @@
 # Math
 
 Note: Functions taking `Tensor` arguments can also take anything accepted by
-[`tf.convert_to_tensor`](../../api_docs/python/framework.md#convert_to_tensor).
+[`tf.convert_to_tensor`](framework.md#convert_to_tensor).
 
 [TOC]
 
@@ -395,7 +395,7 @@ Given a tensor `x` and a tensor `y`, this operation computes \\(x^y\\) for
 corresponding elements in `x` and `y`. For example:
 
 ```
-# tensor 'x' is [[2, 2]], [3, 3]]
+# tensor 'x' is [[2, 2], [3, 3]]
 # tensor 'y' is [[8, 16], [2, 3]]
 tf.pow(x, y) ==> [[256, 65536], [9, 27]]
 ```
@@ -1670,7 +1670,7 @@ otherwise, these are inferred.
 For example:
 
 ```python
-# tensor 'a' is [[1, 2], [3, 4]
+# tensor 'a' is [[1, 2], [3, 4]]
 # tensor `b` is [[5, 0], [0, 6]]
 tf.accumulate_n([a, b, a]) ==> [[7, 4], [6, 14]]
 
diff --git a/tensorflow/g3doc/api_docs/python/nn.md b/tensorflow/g3doc/api_docs/python/nn.md
index 426a114dd8..a3adc4b091 100644
--- a/tensorflow/g3doc/api_docs/python/nn.md
+++ b/tensorflow/g3doc/api_docs/python/nn.md
@@ -3,7 +3,7 @@
 # Neural Network
 
 Note: Functions taking `Tensor` arguments can also take anything accepted by
-[`tf.convert_to_tensor`](../../api_docs/python/framework.md#convert_to_tensor).
+[`tf.convert_to_tensor`](framework.md#convert_to_tensor).
 
 [TOC]
 
@@ -1045,7 +1045,8 @@ expression `tf.nn.softmax(tf.matmul(inputs, weights) + biases)`.
 See our [Candidate Sampling Algorithms Reference]
 (../../extras/candidate_sampling.pdf)
 
-Also see Section 3 of http://arxiv.org/abs/1412.2007 for the math.
+Also see Section 3 of [Jean et al., 2014](http://arxiv.org/abs/1412.2007)
+([pdf](http://arxiv.org/pdf/1412.2007.pdf)) for the math.
 
 ##### Args:
 
diff --git a/tensorflow/g3doc/api_docs/python/script_ops.md b/tensorflow/g3doc/api_docs/python/script_ops.md
index aa1ff4c9c5..a76b65e6bc 100644
--- a/tensorflow/g3doc/api_docs/python/script_ops.md
+++ b/tensorflow/g3doc/api_docs/python/script_ops.md
@@ -3,7 +3,7 @@
 # Wraps python functions
 
 Note: Functions taking `Tensor` arguments can also take anything accepted by
-[`tf.convert_to_tensor`](../../api_docs/python/framework.md#convert_to_tensor).
+[`tf.convert_to_tensor`](framework.md#convert_to_tensor).
 
 [TOC]
 
diff --git a/tensorflow/g3doc/api_docs/python/sparse_ops.md b/tensorflow/g3doc/api_docs/python/sparse_ops.md
index a6893065e1..74d18bc689 100644
--- a/tensorflow/g3doc/api_docs/python/sparse_ops.md
+++ b/tensorflow/g3doc/api_docs/python/sparse_ops.md
@@ -3,7 +3,7 @@
 # Sparse Tensors
 
 Note: Functions taking `Tensor` arguments can also take anything accepted by
-[`tf.convert_to_tensor`](../../api_docs/python/framework.md#convert_to_tensor).
+[`tf.convert_to_tensor`](framework.md#convert_to_tensor).
 
 [TOC]
 
diff --git a/tensorflow/g3doc/api_docs/python/state_ops.md b/tensorflow/g3doc/api_docs/python/state_ops.md
index 0f18ff8346..29355cb1ff 100644
--- a/tensorflow/g3doc/api_docs/python/state_ops.md
+++ b/tensorflow/g3doc/api_docs/python/state_ops.md
@@ -3,7 +3,7 @@
 # Variables
 
 Note: Functions taking `Tensor` arguments can also take anything accepted by
-[`tf.convert_to_tensor`](../../api_docs/python/framework.md#convert_to_tensor).
+[`tf.convert_to_tensor`](framework.md#convert_to_tensor).
 
 [TOC]
 
@@ -1371,7 +1371,8 @@ to keep the scale intact, where `dim = W.shape[0]` (the size of the input).
 A similar calculation for convolutional networks gives an analogous result
 with `dim` equal to the product of the first 3 dimensions.  When
 nonlinearities are present, we need to multiply this by a constant `factor`.
-See <https://arxiv.org/pdf/1412.6558v3.pdf> for deeper motivation, experiments
+See [Sussillo et al., 2014](https://arxiv.org/abs/1412.6558)
+([pdf](http://arxiv.org/pdf/1412.6558.pdf)) for deeper motivation, experiments
 and the calculation of constants. In section 2.3 there, the constants were
 numerically computed: for a linear layer it's 1.0, relu: ~1.43, tanh: ~1.15.
 
@@ -1436,8 +1437,9 @@ This operation computes
 This operation outputs `ref` after the update is done.
 This makes it easier to chain operations that need to use the reset value.
 
-If `indices` contains duplicate entries, lexicographically later entries
-override earlier entries.
+If values in `ref` is to be updated more than once, because there are
+duplicate entires in `indices`, the order at which the updates happen
+for each value is undefined.
 
 Requires `updates.shape = indices.shape + ref.shape[1:]`.
 
diff --git a/tensorflow/g3doc/api_docs/python/train.md b/tensorflow/g3doc/api_docs/python/train.md
index 66e89b7990..3e78ad3eac 100644
--- a/tensorflow/g3doc/api_docs/python/train.md
+++ b/tensorflow/g3doc/api_docs/python/train.md
@@ -356,7 +356,8 @@ Construct a new Momentum optimizer.
 
 Optimizer that implements the Adam algorithm.
 
-See this [paper](http://arxiv.org/pdf/1412.6980v7.pdf).
+See [Kingma et. al., 2014](http://arxiv.org/abs/1412.6980)
+([pdf](http://arxiv.org/pdf/1412.6980.pdf)).
 
 - - -
 
@@ -725,8 +726,8 @@ otherwise they're all shrunk by the global ratio.
 Any of the entries of `t_list` that are of type `None` are ignored.
 
 This is the correct way to perform gradient clipping (for example, see
-R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training
-Recurrent Neural Networks".  http://arxiv.org/abs/1211.5063)
+[Pascanu et al., 2012](http://arxiv.org/abs/1211.5063)
+([pdf](http://arxiv.org/pdf/1211.5063.pdf))).
 
 However, it is slower than `clip_by_norm()` because all the parameters must be
 ready before the clipping operation can be performed.
@@ -1878,7 +1879,7 @@ a Python iterator that yields `Event` protocol buffers.
 Example: Print the contents of an events file.
 
 ```python
-for e in tf.summary_iterator(path to events file):
+for e in tf.train.summary_iterator(path to events file):
     print(e)
 ```
 
@@ -1889,7 +1890,7 @@ Example: Print selected summary values.
 # summary value tag 'loss'.  These could have been added by calling
 # `add_summary()`, passing the output of a scalar summary op created with
 # with: `tf.scalar_summary(['loss'], loss_tensor)`.
-for e in tf.summary_iterator(path to events file):
+for e in tf.train.summary_iterator(path to events file):
     for v in e.summary.value:
         if v.tag == 'loss':
             print(v.simple_value)
diff --git a/tensorflow/g3doc/get_started/os_setup.md b/tensorflow/g3doc/get_started/os_setup.md
index feeb9f40db..e570cc67cf 100644
--- a/tensorflow/g3doc/get_started/os_setup.md
+++ b/tensorflow/g3doc/get_started/os_setup.md
@@ -5,11 +5,11 @@ github source.
 
 ## Requirements
 
-The TensorFlow Python API currently supports Python 2.7 and Python 3.3+ from
-source.
+The TensorFlow Python API supports Python 2.7 and Python 3.3+.
 
-The GPU version (Linux only) currently requires the Cuda Toolkit 7.0 and cuDNN
-v2.  Please see [Cuda installation](#optional-install-cuda-gpus-on-linux).
+The GPU version (Linux only) requires the Cuda Toolkit >= 7.0 and cuDNN >=
+v2.  Please see [Cuda installation](#optional-install-cuda-gpus-on-linux)
+for details.
 
 ## Overview
 
@@ -53,28 +53,28 @@ Install TensorFlow:
 
 ```bash
 # Ubuntu/Linux 64-bit, CPU only:
-$ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.6.0-cp27-none-linux_x86_64.whl
+$ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.7.0-py2-none-linux_x86_64.whl
 
 # Ubuntu/Linux 64-bit, GPU enabled:
-$ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.6.0-cp27-none-linux_x86_64.whl
+$ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.7.0-py2-none-linux_x86_64.whl
 
 # Mac OS X, CPU only:
 $ sudo easy_install --upgrade six
-$ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.6.0-py2-none-any.whl
+$ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.7.0-py2-none-any.whl
 ```
 
 For python3:
 
 ```bash
 # Ubuntu/Linux 64-bit, CPU only:
-$ sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.6.0-cp34-none-linux_x86_64.whl
+$ sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.7.0-py3-none-linux_x86_64.whl
 
 # Ubuntu/Linux 64-bit, GPU enabled:
-$ sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.6.0-cp34-none-linux_x86_64.whl
+$ sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.7.0-py3-none-linux_x86_64.whl
 
 # Mac OS X, CPU only:
 $ sudo easy_install --upgrade six
-$ sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.6.0-py3-none-any.whl
+$ sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.7.0-py3-none-any.whl
 ```
 
 
@@ -121,13 +121,13 @@ $ source ~/tensorflow/bin/activate.csh  # If using csh
 (tensorflow)$  # Your prompt should change
 
 # Ubuntu/Linux 64-bit, CPU only:
-(tensorflow)$ pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.6.0-cp27-none-linux_x86_64.whl
+(tensorflow)$ pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.7.0-py2-none-linux_x86_64.whl
 
 # Ubuntu/Linux 64-bit, GPU enabled:
-(tensorflow)$ pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.6.0-cp27-none-linux_x86_64.whl
+(tensorflow)$ pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.7.0-py2-none-linux_x86_64.whl
 
 # Mac OS X, CPU only:
-(tensorflow)$ pip install --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.6.0-py2-none-any.whl
+(tensorflow)$ pip install --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.7.0-py2-none-any.whl
 ```
 
 and again for python3:
@@ -138,13 +138,13 @@ $ source ~/tensorflow/bin/activate.csh  # If using csh
 (tensorflow)$  # Your prompt should change
 
 # Ubuntu/Linux 64-bit, CPU only:
-(tensorflow)$ pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.6.0-cp34-none-linux_x86_64.whl
+(tensorflow)$ pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.7.0-py3-none-linux_x86_64.whl
 
 # Ubuntu/Linux 64-bit, GPU enabled:
-(tensorflow)$ pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.6.0-cp34-none-linux_x86_64.whl
+(tensorflow)$ pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.7.0-py3-none-linux_x86_64.whl
 
 # Mac OS X, CPU only:
-(tensorflow)$ pip3 install --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.6.0-py3-none-any.whl
+(tensorflow)$ pip3 install --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.7.0-py3-none-any.whl
 ```
 
 With the Virtualenv environment activated, you can now
@@ -186,7 +186,7 @@ code.
 * `b.gcr.io/tensorflow/tensorflow:latest-devel-gpu`: GPU Binary image plus source
 code.
 
-We also have tags with `latest` replaced by a released version (eg `0.6.0-gpu`).
+We also have tags with `latest` replaced by a released version (e.g., `0.7.0-gpu`).
 
 With Docker the installation is as follows:
 
@@ -266,7 +266,7 @@ The exact location of the Python library depends on your system, but is usually
 /usr/local/lib/python2.7/site-packages/tensorflow
 ```
 
-You can find out the directory with the following command:
+You can find out the directory with the following command (make sure to use the Python you installed TensorFlow to, for example, use `python3` instead of `python` if you installed for Python 3):
 
 ```bash
 $ python -c 'import os; import inspect; import tensorflow; print(os.path.dirname(inspect.getfile(tensorflow)))'
@@ -274,7 +274,7 @@ $ python -c 'import os; import inspect; import tensorflow; print(os.path.dirname
 
 The simple demo model for classifying handwritten digits from the MNIST dataset
 is in the sub-directory `models/image/mnist/convolutional.py`.  You can run it from the command
-line as follows:
+line as follows (make sure to use the Python you installed TensorFlow with):
 
 ```bash
 # Using 'python -m' to find the program in the python search path:
@@ -285,7 +285,9 @@ Extracting data/t10k-images-idx3-ubyte.gz
 Extracting data/t10k-labels-idx1-ubyte.gz
 ...etc...
 
-# You can alternatively pass the path to the model program file to the python interpreter.
+# You can alternatively pass the path to the model program file to the python
+# interpreter (make sure to use the python distribution you installed
+# TensorFlow to, for example, .../python3.X/... for Python 3).
 $ python /usr/local/lib/python2.7/dist-packages/tensorflow/models/image/mnist/convolutional.py
 ...
 ```
@@ -348,10 +350,10 @@ Please specify the location of python. [Default is /usr/bin/python]:
 
 #### Optional: Install CUDA (GPUs on Linux)
 
-In order to build or run TensorFlow with GPU support, both Cuda Toolkit 7.0 and
-cuDNN v2 from NVIDIA need to be installed.
+In order to build or run TensorFlow with GPU support, both NVIDIA's Cuda Toolkit (>= 7.0) and
+cuDNN (>= v2) need to be installed.
 
-TensorFlow GPU support requires having a GPU card with NVidia Compute Capability >= 3.5.
+TensorFlow GPU support requires having a GPU card with NVidia Compute Capability >= 3.0.
 Supported cards include but are not limited to:
 
 * NVidia Titan
@@ -359,18 +361,19 @@ Supported cards include but are not limited to:
 * NVidia K20
 * NVidia K40
 
-##### Download and install Cuda Toolkit 7.0
+##### Download and install Cuda Toolkit
 
-https://developer.nvidia.com/cuda-toolkit-70
+https://developer.nvidia.com/cuda-downloads
 
 Install the toolkit into e.g. `/usr/local/cuda`
 
-##### Download and install cuDNN v2
+##### Download and install cuDNN
 
-https://developer.nvidia.com/rdp/cudnn-archive
+https://developer.nvidia.com/cudnn
 
 Uncompress and copy the cuDNN files into the toolkit directory.  Assuming the
-toolkit is installed in `/usr/local/cuda`:
+toolkit is installed in `/usr/local/cuda`, run the following commands (edited
+to reflect the cuDNN version you downloaded):
 
 ``` bash
 tar xvzf cudnn-6.5-linux-x64-v2.tgz
@@ -380,8 +383,12 @@ sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
 ```
 
 ##### Configure TensorFlow's canonical view of Cuda libraries
+
 When running the `configure` script from the root of your source tree, select
-the option `Y` when asked to build TensorFlow with GPU support.
+the option `Y` when asked to build TensorFlow with GPU support. If you have 
+several versions of Cuda or cuDNN installed, you should definitely select
+one explicitly instead of relying on the system default. You should see
+prompts like the following:
 
 ``` bash
 $ ./configure
@@ -389,12 +396,24 @@ Please specify the location of python. [Default is /usr/bin/python]:
 Do you wish to build TensorFlow with GPU support? [y/N] y
 GPU support will be enabled for TensorFlow
 
-Please specify the location where CUDA 7.0 toolkit is installed. Refer to
-README.md for more details. [default is: /usr/local/cuda]: /usr/local/cuda
+Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave
+empty to use system default]: 7.5
 
-Please specify the location where the cuDNN v2 library is installed. Refer to
+Please specify the location where CUDA 7.5 toolkit is installed. Refer to
 README.md for more details. [default is: /usr/local/cuda]: /usr/local/cuda
 
+Please specify the Cudnn version you want to use. [Leave empty to use system
+default]: 4.0.4
+
+Please specify the location where the cuDNN 4.0.4 library is installed. Refer to
+README.md for more details. [default is: /usr/local/cuda]: /usr/local/cudnn-r4-rc/
+
+Please specify a list of comma-separated Cuda compute capabilities you want to
+build with. You can find the compute capability of your device at: 
+https://developer.nvidia.com/cuda-gpus.
+Please note that each additional compute capability significantly increases your
+build time and binary size. [Default is: \"3.5,5.2\"]: 3.5
+    
 Setting up Cuda include
 Setting up Cuda lib64
 Setting up Cuda bin
@@ -404,7 +423,9 @@ Configuration finished
 
 This creates a canonical set of symbolic links to the Cuda libraries on your system.
 Every time you change the Cuda library paths you need to run this step again before
-you invoke the bazel build command.
+you invoke the bazel build command. For the Cudnn libraries, use '6.5' for R2, '7.0'
+for R3, and '4.0.4' for R4-RC.
+
 
 ##### Build your target with GPU support
 From the root of your source tree, run:
@@ -422,57 +443,6 @@ $ bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu
 
 Note that "--config=cuda" is needed to enable the GPU support.
 
-##### Enabling Cuda 3.0
-TensorFlow officially supports Cuda devices with 3.5 and 5.2 compute
-capabilities. In order to enable earlier Cuda devices such as Grid K520, you
-need to target Cuda 3.0. This can be done through TensorFlow unofficial
-settings with "configure".
-
-```bash
-$ TF_UNOFFICIAL_SETTING=1 ./configure
-
-# Same as the official settings above
-
-WARNING: You are configuring unofficial settings in TensorFlow. Because some
-external libraries are not backward compatible, these settings are largely
-untested and unsupported.
-
-Please specify a list of comma-separated Cuda compute capabilities you want to
-build with. You can find the compute capability of your device at:
-https://developer.nvidia.com/cuda-gpus.
-Please note that each additional compute capability significantly increases
-your build time and binary size. [Default is: "3.5,5.2"]: 3.0
-
-Setting up Cuda include
-Setting up Cuda lib64
-Setting up Cuda bin
-Setting up Cuda nvvm
-Configuration finished
-```
-
-##### Using a different Cuda SDK and Cudnn versions
-TensorFlow officially supports Cuda 7.0 and Cudnn V2 (6.5) at this point. In
-order to use a different Cuda SDK or Cudnn libraries, use the unofficial setting
-with "configure"
-
-```bash
-$ TF_UNOFFICIAL_SETTING=1 ./configure
-...
-Please specify the Cuda SDK version you want to use. [Default is 7.0]: 7.5
-Please specify the location where CUDA 7.5 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-7.5
-Please specify the Cudnn version you want to use. [Default is 6.5]: 4.0.4
-Please specify the location where cuDNN 4.0.4 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-7.5]: /usr/local/cudnn-r4-rc/
-...
-Setting up Cuda include
-Setting up Cuda lib64
-Setting up Cuda bin
-Setting up Cuda nvvm
-Configuration finished
-```
-
-For the Cudnn libraries, use '6.5' for R2, '7.0' for R3, and '4.0.4' for
-R4-RC.
-
 ##### Known issues
 
 * Although it is possible to build both Cuda and non-Cuda configs under the same
@@ -481,8 +451,7 @@ configs in the same source tree.
 
 * You have to run configure before running bazel build. Otherwise, the build
 will fail with a clear error message. In the future, we might consider making
-this more convenient by including the configure step in our build process,
-given necessary bazel new feature support.
+this more convenient by including the configure step in our build process.
 
 ### Installation for Mac OS X
 
@@ -543,7 +512,7 @@ $ bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_pack
 $ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
 
 # The name of the .whl file will depend on your platform.
-$ pip install /tmp/tensorflow_pkg/tensorflow-0.6.0-cp27-none-linux_x86_64.whl
+$ pip install /tmp/tensorflow_pkg/tensorflow-0.7.0-py2-none-linux_x86_64.whl
 ```
 
 ## Setting up TensorFlow for Development
@@ -608,6 +577,8 @@ ImportError: libcudart.so.7.0: cannot open shared object file: No such file or d
 ```
 
 Make sure you followed the GPU installation [instructions](#optional-install-cuda-gpus-on-linux).
+If you built from source, and you left the Cuda or cuDNN version empty, try specifying them
+explicitly.
 
 ### Pip installation issues
 
diff --git a/tensorflow/g3doc/how_tos/documentation/index.md b/tensorflow/g3doc/how_tos/documentation/index.md
index 4e34d43289..65c7db3514 100755
--- a/tensorflow/g3doc/how_tos/documentation/index.md
+++ b/tensorflow/g3doc/how_tos/documentation/index.md
@@ -320,9 +320,9 @@ Here's an example from the module docsting in `image_ops.py`:
 
     ```python
     # Decode an image and convert it to HSV.
-    rgb_image = tf.decode_png(...,  channels=3)
-    rgb_image_float = tf.convert_image_dtype(rgb_image, tf.float32)
-    hsv_image = tf.rgb_to_hsv(rgb_image)
+    rgb_image = tf.image.decode_png(...,  channels=3)
+    rgb_image_float = tf.image.convert_image_dtype(rgb_image, tf.float32)
+    hsv_image = tf.image.rgb_to_hsv(rgb_image)
     ```
 
 ### Requirements, caveats, important notes.
diff --git a/tensorflow/g3doc/how_tos/index.md b/tensorflow/g3doc/how_tos/index.md
index a0e51b1f43..cfabce7761 100644
--- a/tensorflow/g3doc/how_tos/index.md
+++ b/tensorflow/g3doc/how_tos/index.md
@@ -6,7 +6,7 @@
 TensorFlow Variables are in-memory buffers containing tensors.  Learn how to
 use them to hold and update model parameters during training.
 
-[View Tutorial](../how_tos/variables/index.md)
+[View Tutorial](variables/index.md)
 
 
 ## TensorFlow Mechanics 101
@@ -15,7 +15,7 @@ A step-by-step walk through of the details of using TensorFlow infrastructure
 to train models at scale, using MNIST handwritten digit recognition as a toy
 example.
 
-[View Tutorial](../tutorials/mnist/tf/index.md)
+[View Tutorial](mnist/tf/index.md)
 
 
 ## TensorBoard: Visualizing Learning
@@ -25,7 +25,7 @@ your model(s).  This tutorial describes how to build and run TensorBoard as well
 as how to add Summary ops to automatically output data to the Events files that
 TensorBoard uses for display.
 
-[View Tutorial](../how_tos/summaries_and_tensorboard/index.md)
+[View Tutorial](summaries_and_tensorboard/index.md)
 
 
 ## TensorBoard: Graph Visualization
@@ -33,7 +33,7 @@ TensorBoard uses for display.
 This tutorial describes how to use the graph visualizer in TensorBoard to help
 you understand the dataflow graph and debug it.
 
-[View Tutorial](../how_tos/graph_viz/index.md)
+[View Tutorial](graph_viz/index.md)
 
 
 ## Reading Data
@@ -41,7 +41,7 @@ you understand the dataflow graph and debug it.
 This tutorial describes the three main methods of getting data into your
 TensorFlow program: Feeding, Reading and Preloading.
 
-[View Tutorial](../how_tos/reading_data/index.md)
+[View Tutorial](reading_data/index.md)
 
 
 ## Threading and Queues
@@ -49,7 +49,7 @@ TensorFlow program: Feeding, Reading and Preloading.
 This tutorial describes the various constructs implemented by TensorFlow
 to facilitate asynchronous and concurrent training.
 
-[View Tutorial](../how_tos/threading_and_queues/index.md)
+[View Tutorial](threading_and_queues/index.md)
 
 
 ## Adding a New Op
@@ -57,7 +57,16 @@ to facilitate asynchronous and concurrent training.
 TensorFlow already has a large suite of node operations from which you can
 compose in your graph, but here are the details of how to add you own custom Op.
 
-[View Tutorial](../how_tos/adding_an_op/index.md)
+[View Tutorial](adding_an_op/index.md)
+
+
+## Writing Documentation
+
+TensorFlow's documentation is largely generated from its source code. Here is an
+introduction to the formats we use, a style guide, and instructions on how to
+build updated documentation from the source.
+
+[View Tutorial](documentation/index.md)
 
 
 ## Custom Data Readers
@@ -65,14 +74,14 @@ compose in your graph, but here are the details of how to add you own custom Op.
 If you have a sizable custom data set, you may want to consider extending
 TensorFlow to read your data directly in it's native format.  Here's how.
 
-[View Tutorial](../how_tos/new_data_formats/index.md)
+[View Tutorial](new_data_formats/index.md)
 
 
 ## Using GPUs
 
 This tutorial describes how to construct and execute models on GPU(s).
 
-[View Tutorial](../how_tos/using_gpu/index.md)
+[View Tutorial](using_gpu/index.md)
 
 
 ## Sharing Variables
@@ -83,7 +92,7 @@ different locations in the model construction code.
 
 The "Variable Scope" mechanism is designed to facilitate that.
 
-[View Tutorial](../how_tos/variable_scope/index.md)
+[View Tutorial](variable_scope/index.md)
 
 ## A Tool Developer's Guide to TensorFlow Model Files
 
diff --git a/tensorflow/g3doc/how_tos/new_data_formats/index.md b/tensorflow/g3doc/how_tos/new_data_formats/index.md
index 489e7a5db4..5f6bdda9c9 100644
--- a/tensorflow/g3doc/how_tos/new_data_formats/index.md
+++ b/tensorflow/g3doc/how_tos/new_data_formats/index.md
@@ -211,8 +211,8 @@ format.  For example, you may have an image saved as a string in
 Depending on the format of that image, you might take the corresponding output
 from a
 [`tf.parse_single_example`](../../api_docs/python/io_ops.md#parse_single_example)
-op and call [`tf.decode_jpeg`](../../api_docs/python/image.md#decode_jpeg),
-[`tf.decode_png`](../../api_docs/python/image.md#decode_png), or
+op and call [`tf.image.decode_jpeg`](../../api_docs/python/image.md#decode_jpeg),
+[`tf.image.decode_png`](../../api_docs/python/image.md#decode_png), or
 [`tf.decode_raw`](../../api_docs/python/io_ops.md#decode_raw).  It is common to
 take the output of `tf.decode_raw` and use
 [`tf.slice`](../../api_docs/python/array_ops.md#slice) and
diff --git a/tensorflow/g3doc/how_tos/variable_scope/index.md b/tensorflow/g3doc/how_tos/variable_scope/index.md
index 8732a830c8..e3b02149e6 100644
--- a/tensorflow/g3doc/how_tos/variable_scope/index.md
+++ b/tensorflow/g3doc/how_tos/variable_scope/index.md
@@ -38,7 +38,7 @@ this one, and even here we already have 4 different variables: `conv1_weights`,
 
 The problem arises when you want to reuse this model. Assume you want to
 apply your image filter to 2 different images, `image1` and `image2`.
-You want both images processed by the same filer with the same parameters.
+You want both images processed by the same filter with the same parameters.
 You can call `my_image_filter()` twice, but this will create two sets
 of variables:
 
diff --git a/tensorflow/g3doc/resources/bib.md b/tensorflow/g3doc/resources/bib.md
index 8a787919da..acfdd18767 100644
--- a/tensorflow/g3doc/resources/bib.md
+++ b/tensorflow/g3doc/resources/bib.md
@@ -1,6 +1,7 @@
-# BibTex Citation
+# TensorFlow Whitepaper
+
 If you use TensorFlow in your research and would like to cite the TensorFlow
-system, we suggest you cite the following whitepaper:
+system, we suggest you cite the [whitepaper](http://download.tensorflow.org/paper/whitepaper2015.pdf):
 
 ```
 @misc{tensorflow2015-whitepaper,
diff --git a/tensorflow/g3doc/resources/index.md b/tensorflow/g3doc/resources/index.md
index 8f6211c6c4..5d14dcbdf6 100644
--- a/tensorflow/g3doc/resources/index.md
+++ b/tensorflow/g3doc/resources/index.md
@@ -20,7 +20,11 @@ endorsed by or otherwise affiliated with Google. When referring to our marks,
 please include the following attribution statement: "TensorFlow, the TensorFlow
 logo and any related marks are trademarks of Google Inc."
 
-### What is TensorFlow used for?
+## What is TensorFlow used for?
+
+TensorFlow enables researchers to build machine learning models. We collect such
+models in our [Zoo](https://github.com/tensorflow/models). If you have built a 
+model with TensorFlow, you may consider publishing it there.
 
 We keep a list of projects that use TensorFlow [here](uses.md). If you made
 something amazing with TensorFlow, we'd like to hear about it!
diff --git a/tensorflow/g3doc/tutorials/recurrent/index.md b/tensorflow/g3doc/tutorials/recurrent/index.md
index 82e369b80a..b39dedffb3 100644
--- a/tensorflow/g3doc/tutorials/recurrent/index.md
+++ b/tensorflow/g3doc/tutorials/recurrent/index.md
@@ -21,8 +21,9 @@ recognition, machine translation, or image captioning. It is also fun, too --
 take a look [here] (http://karpathy.github.io/2015/05/21/rnn-effectiveness/).
 
 For the purpose of this tutorial, we will reproduce the results from
-[Zaremba et al., 2014] (http://arxiv.org/abs/1409.2329), which achieves very
-good results on the PTB dataset.
+[Zaremba et al., 2014] (http://arxiv.org/abs/1409.2329)
+([pdf](http://arxiv.org/pdf/1409.2329.pdf)), which achieves very good results
+on the PTB dataset.
 
 ## Tutorial Files
 
diff --git a/tensorflow/g3doc/tutorials/seq2seq/index.md b/tensorflow/g3doc/tutorials/seq2seq/index.md
index 50101f9a0d..3d64bcc91b 100644
--- a/tensorflow/g3doc/tutorials/seq2seq/index.md
+++ b/tensorflow/g3doc/tutorials/seq2seq/index.md
@@ -38,10 +38,10 @@ File | What's in it?
 ## Sequence-to-Sequence Basics
 
 A basic sequence-to-sequence model, as introduced in
-[Cho et al., 2014](http://arxiv.org/pdf/1406.1078v3.pdf),
-consists of two recurrent neural networks (RNNs): an *encoder* that
-processes the input and a *decoder* that generates the output.
-This basic architecture is depicted below.
+[Cho et al., 2014](http://arxiv.org/abs/1406.1078)
+([pdf](http://arxiv.org/pdf/1406.1078.pdf)), consists of two recurrent neural
+networks (RNNs): an *encoder* that processes the input and a *decoder* that
+generates the output. This basic architecture is depicted below.
 
 <div style="width:80%; margin:auto; margin-bottom:10px; margin-top:20px;">
 <img style="width:100%" src="../../images/basic_seq2seq.png" />
@@ -52,12 +52,14 @@ a GRU cell or an LSTM cell (see the [RNN Tutorial](../../tutorials/recurrent/ind
 for an explanation of those). Encoder and decoder can share weights or,
 as is more common, use a different set of parameters. Multi-layer cells
 have been successfully used in sequence-to-sequence models too, e.g. for
-translation [Sutskever et al., 2014](http://arxiv.org/abs/1409.3215).
+translation [Sutskever et al., 2014](http://arxiv.org/abs/1409.3215)
+([pdf](http://arxiv.org/pdf/1409.3215.pdf)).
 
 In the basic model depicted above, every input has to be encoded into
 a fixed-size state vector, as that is the only thing passed to the decoder.
 To allow the decoder more direct access to the input, an *attention* mechanism
-was introduced in [Bahdanu et al., 2014](http://arxiv.org/abs/1409.0473).
+was introduced in [Bahdanu et al., 2014](http://arxiv.org/abs/1409.0473)
+([pdf](http://arxiv.org/pdf/1409.0473.pdf)).
 We will not go into the details of the attention mechanism (see the paper),
 suffice it to say that it allows the decoder to peek into the input at every
 decoding step. A multi-layer sequence-to-sequence network with LSTM cells and
@@ -127,7 +129,8 @@ All other tensors from this list would be ignored, and instead the previous
 output of the encoder would be used. This is used for decoding translations
 in our translation model, but it can also be used during training, to make
 the model more robust to its own mistakes, similar
-to [Bengio et al., 2015](http://arxiv.org/pdf/1506.03099v2.pdf).
+to [Bengio et al., 2015](http://arxiv.org/abs/1506.03099)
+([pdf](http://arxiv.org/pdf/1506.03099.pdf)).
 
 One more important argument used above is `output_projection`. If not specified,
 the outputs of the embedding model will be tensors of shape batch-size by
@@ -137,7 +140,8 @@ When training models with large output vocabularies, i.e., when
 tensors. Instead, it is better to return smaller output tensors, which will
 later be projected onto a large output tensor using `output_projection`.
 This allows to use our seq2seq models with a sampled softmax loss, as described
-in [Jean et. al., 2015](http://arxiv.org/pdf/1412.2007v2.pdf).
+in [Jean et. al., 2014](http://arxiv.org/abs/1412.2007)
+([pdf](http://arxiv.org/pdf/1412.2007.pdf)).
 
 In addition to `basic_rnn_seq2seq` and `embedding_rnn_seq2seq` there are a few
 more sequence-to-sequence models in `seq2seq.py`, take a look there. They all
@@ -225,7 +229,8 @@ Remember that when constructing decoder inputs we prepend the special `GO`
 symbol to the input data. This is done in the `get_batch()` function in
 `seq2seq_model.py`, which also reverses the input English sentence.
 Reversing the inputs was shown to improve results for the neural translation
-model in [Sutskever et al., 2014](http://arxiv.org/abs/1409.3215).
+model in [Sutskever et al., 2014](http://arxiv.org/abs/1409.3215)
+([pdf](http://arxiv.org/pdf/1409.3215.pdf)).
 To put it all together, imagine we have the sentence "I go.", tokenized
 as `["I", "go", "."]` as input and the sentence "Je vais." as output,
 tokenized `["Je", "vais", "."]`. It will be put in the (5, 10) bucket,
@@ -233,7 +238,7 @@ with encoder inputs representing `[PAD PAD "." "go" "I"]` and decoder
 inputs `[GO "Je" "vais" "." EOS PAD PAD PAD PAD PAD]`.
 
 
-## Let's Run It 
+## Let's Run It
 
 To train the model described above, we need to a large English-French corpus.
 We will use the *10^9-French-English corpus* from the
@@ -329,6 +334,7 @@ Finally, the model presented above can be used for any sequence-to-sequence
 task, not only for translation. Even if you want to transform a sequence to
 a tree, for example to generate a parsing tree, the same model as above can
 give state-of-the-art results, as demonstrated in
-[Vinyals & Kaiser et al., 2015](http://arxiv.org/abs/1412.7449).
+[Vinyals & Kaiser et al., 2014](http://arxiv.org/abs/1412.7449)
+([pdf](http://arxiv.org/pdf/1412.7449.pdf)).
 So you can not only build your own translator, you can also build a parser,
 a chat-bot, or any program that comes to your mind. Experiment!
diff --git a/tensorflow/g3doc/tutorials/word2vec/index.md b/tensorflow/g3doc/tutorials/word2vec/index.md
index 6cd828baaf..32323c5774 100644
--- a/tensorflow/g3doc/tutorials/word2vec/index.md
+++ b/tensorflow/g3doc/tutorials/word2vec/index.md
@@ -238,8 +238,9 @@ below (see also for example
 This explains why these vectors are also useful as features for many canonical
 NLP prediction tasks, such as part-of-speech tagging or named entity recognition
 (see for example the original work by
-[Collobert et al.](http://arxiv.org/pdf/1103.0398v1.pdf), or follow-up work by
-[Turian et al.](http://www.aclweb.org/anthology/P10-1040)).
+[Collobert et al., 2011](http://arxiv.org/abs/1103.0398)
+([pdf](http://arxiv.org/pdf/1103.0398.pdf)), or follow-up work by
+[Turian et al., 2010](http://www.aclweb.org/anthology/P10-1040)).
 
 But for now, let's just use them to draw pretty pictures!
 
diff --git a/tensorflow/models/rnn/translate/seq2seq_model.py b/tensorflow/models/rnn/translate/seq2seq_model.py
index 7046f0976f..663ea600cb 100644
--- a/tensorflow/models/rnn/translate/seq2seq_model.py
+++ b/tensorflow/models/rnn/translate/seq2seq_model.py
@@ -40,7 +40,7 @@ class Seq2SeqModel(object):
   version of this model, but with bi-directional encoder, was presented in
     http://arxiv.org/abs/1409.0473
   and sampled softmax is described in Section 3 of the following paper.
-    http://arxiv.org/pdf/1412.2007v2.pdf
+    http://arxiv.org/abs/1412.2007
   """
 
   def __init__(self, source_vocab_size, target_vocab_size, buckets, size,
diff --git a/tensorflow/models/rnn/translate/translate.py b/tensorflow/models/rnn/translate/translate.py
index c2e740292c..793de26647 100644
--- a/tensorflow/models/rnn/translate/translate.py
+++ b/tensorflow/models/rnn/translate/translate.py
@@ -25,7 +25,7 @@ the current checkpoint translates English sentences into French.
 See the following papers for more information on neural translation models.
  * http://arxiv.org/abs/1409.3215
  * http://arxiv.org/abs/1409.0473
- * http://arxiv.org/pdf/1412.2007v2.pdf
+ * http://arxiv.org/abs/1412.2007
 """
 from __future__ import absolute_import
 from __future__ import division
diff --git a/tensorflow/python/BUILD b/tensorflow/python/BUILD
index 172d09da72..b21c338599 100644
--- a/tensorflow/python/BUILD
+++ b/tensorflow/python/BUILD
@@ -1145,7 +1145,6 @@ py_binary(
     srcs_version = "PY2AND3",
     deps = [
         ":docs",
-        ":platform",
         "//tensorflow:tensorflow_py",
     ],
 )
diff --git a/tensorflow/python/client/session_test.py b/tensorflow/python/client/session_test.py
index 5110b70bf4..48978ecd52 100644
--- a/tensorflow/python/client/session_test.py
+++ b/tensorflow/python/client/session_test.py
@@ -862,6 +862,29 @@ class SessionTest(test_util.TensorFlowTestCase):
       res = sess.partial_run(h2, r2, feed_dict={c: 7})
       self.assertEqual(462, res)
 
+  def testManyPartialRun(self):
+    with session.Session() as sess:
+      steps = 200
+      inputs = []
+      outputs = []
+      a = constant_op.constant(2.0, dtypes.float32)
+      for i in xrange(steps):
+        inputs.append(array_ops.placeholder(dtypes.float32, shape=[]))
+        a = math_ops.mul(a, inputs[i])
+        outputs.append(a)
+
+      h = sess.partial_run_setup(outputs, inputs)
+      for i in xrange(steps):
+        res = sess.partial_run(h, outputs[i], feed_dict={inputs[i]: 1.0})
+      self.assertEqual(2.0, res)
+
+      feed_dict = {}
+      for i in xrange(steps):
+        feed_dict[inputs[i]] = 1.0
+      res = sess.run(outputs, feed_dict)
+      self.assertEqual(steps, len(res))
+      self.assertEqual(2.0, res[-1])
+
   def testFeedDictKeyException(self):
     with session.Session() as sess:
       a = constant_op.constant(1.0, dtypes.float32, name='a')
diff --git a/tensorflow/python/framework/docs.py b/tensorflow/python/framework/docs.py
index aefbffdc47..2f9201592a 100644
--- a/tensorflow/python/framework/docs.py
+++ b/tensorflow/python/framework/docs.py
@@ -17,6 +17,7 @@
 
 Updates the documentation files.
 """
+
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
diff --git a/tensorflow/python/framework/gen_docs_combined.py b/tensorflow/python/framework/gen_docs_combined.py
index 9f83211f1d..7dbadef37e 100644
--- a/tensorflow/python/framework/gen_docs_combined.py
+++ b/tensorflow/python/framework/gen_docs_combined.py
@@ -35,22 +35,24 @@ tf.flags.DEFINE_boolean("print_hidden_regex", False,
 FLAGS = tf.flags.FLAGS
 
 
-# TODO(josh11b,wicke): Remove the ../../api_docs/python/ once the
-# website can handle it.
 PREFIX_TEXT = """
 Note: Functions taking `Tensor` arguments can also take anything accepted by
-[`tf.convert_to_tensor`](../../api_docs/python/framework.md#convert_to_tensor).
+[`tf.convert_to_tensor`](framework.md#convert_to_tensor).
 """
 
 
 def get_module_to_name():
-  return {tf: "tf",
-          tf.errors: "tf.errors",
-          tf.image: "tf.image",
-          tf.nn: "tf.nn",
-          tf.train: "tf.train",
-          tf.python_io: "tf.python_io",
-          tf.test: "tf.test",}
+  return {
+    tf: "tf",
+    tf.errors: "tf.errors",
+    tf.image: "tf.image",
+    tf.nn: "tf.nn",
+    tf.train: "tf.train",
+    tf.python_io: "tf.python_io",
+    tf.test: "tf.test",
+    tf.contrib.layers: "tf.contrib.layers",
+    tf.contrib.util: "tf.contrib.util",
+  }
 
 def all_libraries(module_to_name, members, documented):
   # A list of (filename, docs.Library) pairs representing the individual files
@@ -114,6 +116,8 @@ def all_libraries(module_to_name, members, documented):
                                "RankingExample", "SequenceExample"]),
       library("script_ops", "Wraps python functions", prefix=PREFIX_TEXT),
       library("test", "Testing", tf.test),
+      library("contrib.layers", "Layers (contrib)", tf.contrib.layers),
+      library("contrib.util", "Utilities (contrib)", tf.contrib.util),
   ]
 
 _hidden_symbols = ["Event", "LogMessage", "Summary", "SessionLog", "xrange",
diff --git a/tensorflow/python/framework/load_library.py b/tensorflow/python/framework/load_library.py
index 95299d9350..c455806734 100644
--- a/tensorflow/python/framework/load_library.py
+++ b/tensorflow/python/framework/load_library.py
@@ -60,7 +60,7 @@ def load_op_library(library_filename):
 
   op_list_str = py_tf.TF_GetOpList(lib_handle)
   op_list = op_def_pb2.OpList()
-  op_list.ParseFromString(bytes(op_list_str))
+  op_list.ParseFromString(compat.as_bytes(op_list_str))
   wrappers = py_tf.GetPythonWrappers(op_list_str, len(op_list_str))
 
   # Get a unique name for the module.
diff --git a/tensorflow/python/framework/ops.py b/tensorflow/python/framework/ops.py
index 5f9be0a576..b777438e7a 100644
--- a/tensorflow/python/framework/ops.py
+++ b/tensorflow/python/framework/ops.py
@@ -2901,7 +2901,7 @@ class Graph(object):
   # pylint: enable=g-doc-return-or-yield
 
 
-def device(dev):
+def device(device_name_or_function):
   """Wrapper for `Graph.device()` using the default graph.
 
   See
@@ -2916,7 +2916,7 @@ def device(dev):
     A context manager that specifies the default device to use for newly
     created ops.
   """
-  return get_default_graph().device(dev)
+  return get_default_graph().device(device_name_or_function)
 
 
 def name_scope(name):
diff --git a/tensorflow/python/kernel_tests/bitcast_op_test.py b/tensorflow/python/kernel_tests/bitcast_op_test.py
index 2e4411e9b5..6b97eb866d 100644
--- a/tensorflow/python/kernel_tests/bitcast_op_test.py
+++ b/tensorflow/python/kernel_tests/bitcast_op_test.py
@@ -28,8 +28,8 @@ class BitcastTest(tf.test.TestCase):
     with self.test_session():
       tf_ans = tf.bitcast(x, datatype)
       out = tf_ans.eval()
-      buff_after = np.getbuffer(out)
-      buff_before = np.getbuffer(x)
+      buff_after = memoryview(out).tobytes()
+      buff_before = memoryview(x).tobytes()
       self.assertEqual(buff_before, buff_after)
       self.assertEqual(tf_ans.get_shape(), shape)
 
diff --git a/tensorflow/python/kernel_tests/scatter_ops_test.py b/tensorflow/python/kernel_tests/scatter_ops_test.py
index 1472aaed13..357c8f1a98 100644
--- a/tensorflow/python/kernel_tests/scatter_ops_test.py
+++ b/tensorflow/python/kernel_tests/scatter_ops_test.py
@@ -22,47 +22,89 @@ import numpy as np
 import tensorflow as tf
 
 
+def _AsType(v, vtype):
+  return v.astype(vtype) if isinstance(v, np.ndarray) else vtype(v)
+
+
+def _NumpyAdd(ref, indices, updates):
+  # Since numpy advanced assignment does not support repeated indices,
+  # we run a simple loop to perform scatter_add.
+  for i, indx in np.ndenumerate(indices):
+    ref[indx] += updates[i]
+
+
+def _NumpySub(ref, indices, updates):
+  for i, indx in np.ndenumerate(indices):
+    ref[indx] -= updates[i]
+
+
 class ScatterTest(tf.test.TestCase):
 
-  def _VariableRankTest(self, np_scatter, tf_scatter):
+  def _VariableRankTest(self, np_scatter, tf_scatter, vtype, itype, use_gpu,
+                        repeat_indices=False):
     np.random.seed(8)
-    with self.test_session():
-      for indices_shape in (), (2,), (2, 3), (2, 3, 4):
-        for extra_shape in (), (5,), (5, 6):
+    with self.test_session(use_gpu=use_gpu):
+      for indices_shape in (), (2,), (3, 7), (3, 4, 7):
+        for extra_shape in (), (5,), (5, 9):
           # Generate random indices with no duplicates for easy numpy comparison
-          size = np.prod(indices_shape, dtype=np.int32)
-          indices = np.arange(2 * size)
+          size = np.prod(indices_shape, dtype=itype)
+          first_dim = 3 * size
+          indices = np.arange(first_dim)
           np.random.shuffle(indices)
-          indices = indices[:size].reshape(indices_shape)
-          updates = np.random.randn(*(indices_shape + extra_shape))
-          old = np.random.randn(*((2 * size,) + extra_shape))
-        # Scatter via numpy
-        new = old.copy()
-        np_scatter(new, indices, updates)
-        # Scatter via tensorflow
-        ref = tf.Variable(old)
-        ref.initializer.run()
-        tf_scatter(ref, indices, updates).eval()
-        # Compare
-        self.assertAllClose(ref.eval(), new)
+          indices = indices[:size]
+          if size > 1 and repeat_indices:
+            # Add some random repeats.
+            indices = indices[:size//2]
+            for _ in range(size-size//2):
+              # Randomly append some repeats.
+              indices = np.append(indices, indices[np.random.randint(size//2)])
+            np.random.shuffle(indices)
+          indices = indices.reshape(indices_shape)
+          updates = _AsType(np.random.randn(*(indices_shape + extra_shape)),
+                            vtype)
+          old = _AsType(np.random.randn(*((first_dim,) + extra_shape)), vtype)
+
+          # Scatter via numpy
+          new = old.copy()
+          np_scatter(new, indices, updates)
+          # Scatter via tensorflow
+          ref = tf.Variable(old)
+          ref.initializer.run()
+          tf_scatter(ref, indices, updates).eval()
+          # Compare
+          self.assertAllClose(ref.eval(), new)
+
+  def _VariableRankTests(self, np_scatter, tf_scatter):
+    for vtype in (np.float32, np.float64):
+      for itype in (np.int32, np.int64):
+        for use_gpu in (False, True):
+          self._VariableRankTest(np_scatter, tf_scatter, vtype, itype, use_gpu)
 
   def testVariableRankUpdate(self):
     def update(ref, indices, updates):
       ref[indices] = updates
-    self._VariableRankTest(update, tf.scatter_update)
+    self._VariableRankTests(update, tf.scatter_update)
 
   def testVariableRankAdd(self):
-    def add(ref, indices, updates):
-      ref[indices] += updates
-    self._VariableRankTest(add, tf.scatter_add)
+    self._VariableRankTests(_NumpyAdd, tf.scatter_add)
 
   def testVariableRankSub(self):
-    def sub(ref, indices, updates):
-      ref[indices] -= updates
-    self._VariableRankTest(sub, tf.scatter_sub)
+    self._VariableRankTests(_NumpySub, tf.scatter_sub)
+
+  def _ScatterRepeatIndicesTest(self, np_scatter, tf_scatter):
+    for vtype in (np.float32, np.float64):
+      for itype in (np.int32, np.int64):
+        for use_gpu in (False, True):
+          self._VariableRankTest(np_scatter, tf_scatter, vtype, itype, use_gpu,
+                                 repeat_indices=True)
+
+  def testScatterRepeatIndices(self):
+    """This tests scatter_add using indices that repeat."""
+    self._ScatterRepeatIndicesTest(_NumpyAdd, tf.scatter_add)
+    self._ScatterRepeatIndicesTest(_NumpySub, tf.scatter_sub)
 
   def testBooleanScatterUpdate(self):
-    with self.test_session() as session:
+    with self.test_session(use_gpu=False) as session:
       var = tf.Variable([True, False])
       update0 = tf.scatter_update(var, 1, True)
       update1 = tf.scatter_update(var, tf.constant(0, dtype=tf.int64), False)
@@ -72,6 +114,50 @@ class ScatterTest(tf.test.TestCase):
 
       self.assertAllEqual([False, True], var.eval())
 
+  def testScatterOutOfRangeCpu(self):
+    for op in (tf.scatter_add, tf.scatter_sub, tf.scatter_update):
+      params = np.array([1, 2, 3, 4, 5, 6]).astype(np.float32)
+      updates = np.array([-3, -4, -5]).astype(np.float32)
+      with self.test_session(use_gpu=False):
+        ref = tf.Variable(params)
+        ref.initializer.run()
+
+        # Indices all in range, no problem.
+        indices = np.array([2, 0, 5])
+        op(ref, indices, updates).eval()
+
+        # Test some out of range errors.
+        indices = np.array([-1, 0, 5])
+        with self.assertRaisesOpError('indices is out of range'):
+          op(ref, indices, updates).eval()
+
+        indices = np.array([2, 0, 6])
+        with self.assertRaisesOpError('indices is out of range'):
+          op(ref, indices, updates).eval()
+
+  # TODO(fpmc): Re-enable this test when gpu_pip test actually runs on a GPU.
+  def _disabledTestScatterOutOfRangeGpu(self):
+    if not tf.test.IsBuiltWithCuda():
+      return
+    for op in (tf.scatter_add, tf.scatter_sub, tf.scatter_update):
+      params = np.array([1, 2, 3, 4, 5, 6]).astype(np.float32)
+      updates = np.array([-3, -4, -5]).astype(np.float32)
+      # With GPU, the code ignores indices that are out of range.
+      # We don't test the implementation; just test there's no failures.
+      with self.test_session(force_gpu=True):
+        ref = tf.Variable(params)
+        ref.initializer.run()
+
+        # Indices all in range, no problem.
+        indices = np.array([2, 0, 5])
+        op(ref, indices, updates).eval()
+
+        # Indicies out of range should not fail.
+        indices = np.array([-1, 0, 5])
+        op(ref, indices, updates).eval()
+        indices = np.array([2, 0, 6])
+        op(ref, indices, updates).eval()
+
 
 if __name__ == "__main__":
   tf.test.main()
diff --git a/tensorflow/python/ops/clip_ops.py b/tensorflow/python/ops/clip_ops.py
index a85eb85a82..fea6d44476 100644
--- a/tensorflow/python/ops/clip_ops.py
+++ b/tensorflow/python/ops/clip_ops.py
@@ -157,8 +157,8 @@ def clip_by_global_norm(t_list, clip_norm, use_norm=None, name=None):
   Any of the entries of `t_list` that are of type `None` are ignored.
 
   This is the correct way to perform gradient clipping (for example, see
-  R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training
-  Recurrent Neural Networks".  http://arxiv.org/abs/1211.5063)
+  [Pascanu et al., 2012](http://arxiv.org/abs/1211.5063)
+  ([pdf](http://arxiv.org/pdf/1211.5063.pdf))).
 
   However, it is slower than `clip_by_norm()` because all the parameters must be
   ready before the clipping operation can be performed.
diff --git a/tensorflow/python/ops/image_ops.py b/tensorflow/python/ops/image_ops.py
index 89c2f96d19..5170527860 100644
--- a/tensorflow/python/ops/image_ops.py
+++ b/tensorflow/python/ops/image_ops.py
@@ -109,9 +109,9 @@ Example:
 
 ```python
 # Decode an image and convert it to HSV.
-rgb_image = tf.decode_png(...,  channels=3)
-rgb_image_float = tf.convert_image_dtype(rgb_image, tf.float32)
-hsv_image = tf.rgb_to_hsv(rgb_image)
+rgb_image = tf.image.decode_png(...,  channels=3)
+rgb_image_float = tf.image.convert_image_dtype(rgb_image, tf.float32)
+hsv_image = tf.image.rgb_to_hsv(rgb_image)
 ```
 
 @@rgb_to_grayscale
@@ -776,7 +776,7 @@ def adjust_contrast(images, contrast_factor):
     contrast_factor: A float multiplier for adjusting contrast.
 
   Returns:
-    The constrast-adjusted image or images.
+    The contrast-adjusted image or images.
   """
   with ops.op_scope([images, contrast_factor], None, 'adjust_contrast') as name:
     # Remember original dtype to so we can convert back if needed
diff --git a/tensorflow/python/ops/init_ops.py b/tensorflow/python/ops/init_ops.py
index eda688b1a9..c933d63902 100644
--- a/tensorflow/python/ops/init_ops.py
+++ b/tensorflow/python/ops/init_ops.py
@@ -161,7 +161,8 @@ def uniform_unit_scaling_initializer(factor=1.0, seed=None,
   A similar calculation for convolutional networks gives an analogous result
   with `dim` equal to the product of the first 3 dimensions.  When
   nonlinearities are present, we need to multiply this by a constant `factor`.
-  See <https://arxiv.org/pdf/1412.6558v3.pdf> for deeper motivation, experiments
+  See [Sussillo et al., 2014](https://arxiv.org/abs/1412.6558)
+  ([pdf](http://arxiv.org/pdf/1412.6558.pdf)) for deeper motivation, experiments
   and the calculation of constants. In section 2.3 there, the constants were
   numerically computed: for a linear layer it's 1.0, relu: ~1.43, tanh: ~1.15.
 
diff --git a/tensorflow/python/ops/math_ops.py b/tensorflow/python/ops/math_ops.py
index e8337fad07..74c80b2933 100644
--- a/tensorflow/python/ops/math_ops.py
+++ b/tensorflow/python/ops/math_ops.py
@@ -254,7 +254,7 @@ def pow(x, y, name=None):
   corresponding elements in `x` and `y`. For example:
 
   ```
-  # tensor 'x' is [[2, 2]], [3, 3]]
+  # tensor 'x' is [[2, 2], [3, 3]]
   # tensor 'y' is [[8, 16], [2, 3]]
   tf.pow(x, y) ==> [[256, 65536], [9, 27]]
   ```
@@ -1080,7 +1080,7 @@ def accumulate_n(inputs, shape=None, tensor_dtype=None, name=None):
   For example:
 
   ```python
-  # tensor 'a' is [[1, 2], [3, 4]
+  # tensor 'a' is [[1, 2], [3, 4]]
   # tensor `b` is [[5, 0], [0, 6]]
   tf.accumulate_n([a, b, a]) ==> [[7, 4], [6, 14]]
 
diff --git a/tensorflow/python/ops/nn.py b/tensorflow/python/ops/nn.py
index ce17b79f0d..6120ad1a02 100644
--- a/tensorflow/python/ops/nn.py
+++ b/tensorflow/python/ops/nn.py
@@ -824,7 +824,8 @@ def sampled_softmax_loss(weights, biases, inputs, labels, num_sampled,
   See our [Candidate Sampling Algorithms Reference]
   (../../extras/candidate_sampling.pdf)
 
-  Also see Section 3 of http://arxiv.org/abs/1412.2007 for the math.
+  Also see Section 3 of [Jean et al., 2014](http://arxiv.org/abs/1412.2007)
+  ([pdf](http://arxiv.org/pdf/1412.2007.pdf)) for the math.
 
   Args:
     weights: A `Tensor` of shape `[num_classes, dim]`, or a list of `Tensor`
diff --git a/tensorflow/python/ops/rnn_cell.py b/tensorflow/python/ops/rnn_cell.py
index 11539d94f3..c9dfbca979 100644
--- a/tensorflow/python/ops/rnn_cell.py
+++ b/tensorflow/python/ops/rnn_cell.py
@@ -159,7 +159,7 @@ class GRUCell(RNNCell):
 class BasicLSTMCell(RNNCell):
   """Basic LSTM recurrent network cell.
 
-  The implementation is based on: http://arxiv.org/pdf/1409.2329v5.pdf.
+  The implementation is based on: http://arxiv.org/abs/1409.2329.
 
   We add forget_bias (default: 1) to the biases of the forget gate in order to
   reduce the scale of forgetting in the beginning of the training.
diff --git a/tensorflow/python/ops/seq2seq.py b/tensorflow/python/ops/seq2seq.py
index 16738b7666..2fc2617b84 100644
--- a/tensorflow/python/ops/seq2seq.py
+++ b/tensorflow/python/ops/seq2seq.py
@@ -84,7 +84,7 @@ def rnn_decoder(decoder_inputs, initial_state, cell, loop_function=None,
     loop_function: If not None, this function will be applied to the i-th output
       in order to generate the i+1-st input, and decoder_inputs will be ignored,
       except for the first element ("GO" symbol). This can be used for decoding,
-      but also for training to emulate http://arxiv.org/pdf/1506.03099v2.pdf.
+      but also for training to emulate http://arxiv.org/abs/1506.03099.
       Signature -- loop_function(prev, i) = next
         * prev is a 2D Tensor of shape [batch_size x cell.output_size],
         * i is an integer, the step number (when advanced control is needed),
@@ -198,7 +198,7 @@ def embedding_rnn_decoder(decoder_inputs, initial_state, cell, num_symbols,
       used (the "GO" symbol), and all other decoder inputs will be generated by:
         next = embedding_lookup(embedding, argmax(previous_output)),
       In effect, this implements a greedy decoder. It can also be used
-      during training to emulate http://arxiv.org/pdf/1506.03099v2.pdf.
+      during training to emulate http://arxiv.org/abs/1506.03099.
       If False, decoder_inputs are used as given (the standard decoder case).
     scope: VariableScope for the created subgraph; defaults to
       "embedding_rnn_decoder".
@@ -428,7 +428,7 @@ def attention_decoder(decoder_inputs, initial_state, attention_states, cell,
     loop_function: If not None, this function will be applied to i-th output
       in order to generate i+1-th input, and decoder_inputs will be ignored,
       except for the first element ("GO" symbol). This can be used for decoding,
-      but also for training to emulate http://arxiv.org/pdf/1506.03099v2.pdf.
+      but also for training to emulate http://arxiv.org/abs/1506.03099.
       Signature -- loop_function(prev, i) = next
         * prev is a 2D Tensor of shape [batch_size x cell.output_size],
         * i is an integer, the step number (when advanced control is needed),
@@ -569,7 +569,7 @@ def embedding_attention_decoder(decoder_inputs, initial_state, attention_states,
       used (the "GO" symbol), and all other decoder inputs will be generated by:
         next = embedding_lookup(embedding, argmax(previous_output)),
       In effect, this implements a greedy decoder. It can also be used
-      during training to emulate http://arxiv.org/pdf/1506.03099v2.pdf.
+      during training to emulate http://arxiv.org/abs/1506.03099.
       If False, decoder_inputs are used as given (the standard decoder case).
     dtype: The dtype to use for the RNN initial states (default: tf.float32).
     scope: VariableScope for the created subgraph; defaults to
@@ -716,7 +716,7 @@ def one2many_rnn_seq2seq(encoder_inputs, decoder_inputs_dict, cell,
 
   This is a multi-task sequence-to-sequence model with one encoder and multiple
   decoders. Reference to multi-task sequence-to-sequence learning can be found
-  here: http://arxiv.org/pdf/1511.06114v2.pdf
+  here: http://arxiv.org/abs/1511.06114
 
   Args:
     encoder_inputs: A list of 1D int32 Tensors of shape [batch_size].
@@ -757,7 +757,7 @@ def one2many_rnn_seq2seq(encoder_inputs, decoder_inputs_dict, cell,
     _, encoder_state = rnn.rnn(encoder_cell, encoder_inputs, dtype=dtype)
 
     # Decoder.
-    for name, decoder_inputs in decoder_inputs_dict.iteritems():
+    for name, decoder_inputs in decoder_inputs_dict.items():
       num_decoder_symbols = num_decoder_symbols_dict[name]
 
       with variable_scope.variable_scope("one2many_decoder_" + str(name)):
diff --git a/tensorflow/python/tools/freeze_graph.py b/tensorflow/python/tools/freeze_graph.py
index 97dc374e3a..8099d54e0f 100644
--- a/tensorflow/python/tools/freeze_graph.py
+++ b/tensorflow/python/tools/freeze_graph.py
@@ -87,7 +87,8 @@ def freeze_graph(input_graph, input_saver, input_binary, input_checkpoint,
     return -1
 
   input_graph_def = tf.GraphDef()
-  with open(input_graph, "rb") as f:
+  mode = "rb" if input_binary else "r"
+  with open(input_graph, mode) as f:
     if input_binary:
       input_graph_def.ParseFromString(f.read())
     else:
@@ -101,7 +102,7 @@ def freeze_graph(input_graph, input_saver, input_binary, input_checkpoint,
 
   with tf.Session() as sess:
     if input_saver:
-      with open(input_saver, "rb") as f:
+      with open(input_saver, mode) as f:
         saver_def = tf.train.SaverDef()
         if input_binary:
           saver_def.ParseFromString(f.read())
diff --git a/tensorflow/python/training/adam.py b/tensorflow/python/training/adam.py
index e691d1b1bf..a258ee57e2 100644
--- a/tensorflow/python/training/adam.py
+++ b/tensorflow/python/training/adam.py
@@ -30,7 +30,8 @@ from tensorflow.python.training import training_ops
 class AdamOptimizer(optimizer.Optimizer):
   """Optimizer that implements the Adam algorithm.
 
-  See this [paper](http://arxiv.org/pdf/1412.6980v7.pdf).
+  See [Kingma et. al., 2014](http://arxiv.org/abs/1412.6980)
+  ([pdf](http://arxiv.org/pdf/1412.6980.pdf)).
 
   @@__init__
   """
diff --git a/tensorflow/python/training/saver_test.py b/tensorflow/python/training/saver_test.py
index b7bd2bfd3a..f65aebd300 100644
--- a/tensorflow/python/training/saver_test.py
+++ b/tensorflow/python/training/saver_test.py
@@ -456,18 +456,18 @@ class MaxToKeepTest(tf.test.TestCase):
 
       s1 = save.save(sess, os.path.join(save_dir, "s1"))
       self.assertEqual([s1], save.last_checkpoints)
-      self.assertEquals(2, len(gfile.Glob(s1)))
+      self.assertEqual(2, len(gfile.Glob(s1)))
 
       s2 = save.save(sess, os.path.join(save_dir, "s2"))
       self.assertEqual([s1, s2], save.last_checkpoints)
-      self.assertEquals(2, len(gfile.Glob(s1)))
-      self.assertEquals(2, len(gfile.Glob(s2)))
+      self.assertEqual(2, len(gfile.Glob(s1)))
+      self.assertEqual(2, len(gfile.Glob(s2)))
 
       s3 = save.save(sess, os.path.join(save_dir, "s3"))
       self.assertEqual([s2, s3], save.last_checkpoints)
-      self.assertEquals(0, len(gfile.Glob(s1)))
-      self.assertEquals(2, len(gfile.Glob(s2)))
-      self.assertEquals(2, len(gfile.Glob(s3)))
+      self.assertEqual(0, len(gfile.Glob(s1)))
+      self.assertEqual(2, len(gfile.Glob(s2)))
+      self.assertEqual(2, len(gfile.Glob(s3)))
 
 
 class KeepCheckpointEveryNHoursTest(tf.test.TestCase):
@@ -654,7 +654,7 @@ class LatestCheckpointWithRelativePaths(tf.test.TestCase):
 
           # Restore "v0" from that checkpoint.
           save.restore(sess, name)
-          self.assertEquals(v0.eval(), 2.0)
+          self.assertEqual(v0.eval(), 2.0)
 
 
 class CheckpointStateTest(tf.test.TestCase):
diff --git a/tensorflow/python/training/summary_io.py b/tensorflow/python/training/summary_io.py
index 3bd0413e3f..1257230df9 100644
--- a/tensorflow/python/training/summary_io.py
+++ b/tensorflow/python/training/summary_io.py
@@ -234,7 +234,7 @@ def summary_iterator(path):
   Example: Print the contents of an events file.
 
   ```python
-  for e in tf.summary_iterator(path to events file):
+  for e in tf.train.summary_iterator(path to events file):
       print(e)
   ```
 
@@ -245,7 +245,7 @@ def summary_iterator(path):
   # summary value tag 'loss'.  These could have been added by calling
   # `add_summary()`, passing the output of a scalar summary op created with
   # with: `tf.scalar_summary(['loss'], loss_tensor)`.
-  for e in tf.summary_iterator(path to events file):
+  for e in tf.train.summary_iterator(path to events file):
       for v in e.summary.value:
           if v.tag == 'loss':
               print(v.simple_value)
diff --git a/tensorflow/stream_executor/dso_loader.cc b/tensorflow/stream_executor/dso_loader.cc
index 8a7d0925ce..fefea9144d 100644
--- a/tensorflow/stream_executor/dso_loader.cc
+++ b/tensorflow/stream_executor/dso_loader.cc
@@ -37,11 +37,11 @@ namespace internal {
 
 // TensorFlow OSS configure uses the following lines to configure versions. For
 // any modifications of the format, please make sure the script still works.
-string GetCudaVersion() { return "7.0"; }
-string GetCudnnVersion() { return "6.5"; }
+string GetCudaVersion() { return ""; }
+string GetCudnnVersion() { return ""; }
 
 /* static */ port::Status DsoLoader::GetCublasDsoHandle(void** dso_handle) {
-  return GetDsoHandle(FindDsoPath("libcublas.so." + GetCudaVersion(),
+  return GetDsoHandle(FindDsoPath("libcublas.so" + GetCudaVersion(),
                                   "third_party/gpus/cuda/lib64"),
                       dso_handle);
 }
@@ -51,19 +51,19 @@ string GetCudnnVersion() { return "6.5"; }
   // different version number than other CUDA libraries.  See b/22397368 for
   // some details about the complications surrounding this.
   return GetDsoHandle(
-      FindDsoPath("libcudnn.so." + GetCudnnVersion(),
+      FindDsoPath("libcudnn.so" + GetCudnnVersion(),
                   "third_party/gpus/cuda/lib64"),
       dso_handle);
 }
 
 /* static */ port::Status DsoLoader::GetCufftDsoHandle(void** dso_handle) {
-  return GetDsoHandle(FindDsoPath("libcufft.so." + GetCudaVersion(),
+  return GetDsoHandle(FindDsoPath("libcufft.so" + GetCudaVersion(),
                                   "third_party/gpus/cuda/lib64"),
                       dso_handle);
 }
 
 /* static */ port::Status DsoLoader::GetCurandDsoHandle(void** dso_handle) {
-  return GetDsoHandle(FindDsoPath("libcurand.so." + GetCudaVersion(),
+  return GetDsoHandle(FindDsoPath("libcurand.so" + GetCudaVersion(),
                                   "third_party/gpus/cuda/lib64"),
                       dso_handle);
 }
@@ -76,7 +76,7 @@ string GetCudnnVersion() { return "6.5"; }
 
 /* static */ port::Status DsoLoader::GetLibcuptiDsoHandle(void** dso_handle) {
   return GetDsoHandle(
-      FindDsoPath("libcupti.so." + GetCudaVersion(),
+      FindDsoPath("libcupti.so" + GetCudaVersion(),
                   "third_party/gpus/cuda/extras/CUPTI/lib64"),
       dso_handle);
 }
diff --git a/tensorflow/stream_executor/platform/default/mutex.h b/tensorflow/stream_executor/platform/default/mutex.h
index 3b203c285c..0ce1eeadbb 100644
--- a/tensorflow/stream_executor/platform/default/mutex.h
+++ b/tensorflow/stream_executor/platform/default/mutex.h
@@ -45,20 +45,29 @@ typedef std::mutex BaseMutex;
 
 // A class that wraps around the std::mutex implementation, only adding an
 // additional LinkerInitialized constructor interface.
-class mutex : public BaseMutex {
+class LOCKABLE mutex : public BaseMutex {
  public:
   mutex() {}
   // The default implementation of std::mutex is safe to use after the linker
   // initializations
   explicit mutex(LinkerInitialized x) {}
+
+  void lock() ACQUIRE() { BaseMutex::lock(); }
+  void unlock() RELEASE() { BaseMutex::unlock(); }
 };
 
-typedef std::unique_lock<BaseMutex> mutex_lock;
+class SCOPED_LOCKABLE mutex_lock : public std::unique_lock<BaseMutex> {
+ public:
+  mutex_lock(class mutex& m) ACQUIRE(m) : std::unique_lock<BaseMutex>(m) {}
+  ~mutex_lock() RELEASE() {}
+};
 
 #ifdef STREAM_EXECUTOR_USE_SHARED_MUTEX
+// TODO(vrv): Annotate these with ACQUIRE_SHARED after implementing
+// as classes.
 typedef std::shared_lock<BaseMutex> shared_lock;
 #else
-typedef std::unique_lock<BaseMutex> shared_lock;
+typedef mutex_lock shared_lock;
 #endif
 
 using std::condition_variable;
diff --git a/tensorflow/tensorboard/BUILD b/tensorflow/tensorboard/BUILD
index d5c22cf002..4e38903aed 100644
--- a/tensorflow/tensorboard/BUILD
+++ b/tensorflow/tensorboard/BUILD
@@ -17,15 +17,14 @@ filegroup(
     ] + glob(["lib/**/*"]),
 )
 
-
 py_binary(
     name = "tensorboard",
     srcs = ["tensorboard.py"],
     data = [":frontend"],
     srcs_version = "PY2AND3",
     deps = [
-        "//tensorflow/tensorboard/backend:server",
         "//tensorflow/python:platform",
+        "//tensorflow/tensorboard/backend:server",
     ],
 )
 
diff --git a/tensorflow/tensorboard/backend/handler_test.py b/tensorflow/tensorboard/backend/handler_test.py
index 0a97b20e3d..958a83c27c 100644
--- a/tensorflow/tensorboard/backend/handler_test.py
+++ b/tensorflow/tensorboard/backend/handler_test.py
@@ -22,6 +22,8 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
 
+from six.moves import xrange
+
 from tensorflow.python.platform import googletest
 from tensorflow.tensorboard.backend import handler
 
diff --git a/tensorflow/tensorboard/components/tf-image-dashboard/tf-image-grid.html b/tensorflow/tensorboard/components/tf-image-dashboard/tf-image-grid.html
index 268c42a3d6..0df9b6d080 100644
--- a/tensorflow/tensorboard/components/tf-image-dashboard/tf-image-grid.html
+++ b/tensorflow/tensorboard/components/tf-image-dashboard/tf-image-grid.html
@@ -39,6 +39,7 @@ is high)
         <template
           is="dom-repeat"
           items="[[_runs]]"
+          sort="_sort"
           as="run"
         >
         <div class="run-name-cell noshrink">
@@ -60,6 +61,7 @@ is high)
             <template
               is="dom-repeat"
               items="[[_runs]]"
+              sort="_sort"
               as="run"
             >
               <div class="image-cell noshrink">
diff --git a/tensorflow/tensorboard/dist/tf-tensorboard.html b/tensorflow/tensorboard/dist/tf-tensorboard.html
index 78e975886e..a5d28c7c41 100644
--- a/tensorflow/tensorboard/dist/tf-tensorboard.html
+++ b/tensorflow/tensorboard/dist/tf-tensorboard.html
@@ -2333,7 +2333,7 @@ var TF;
     <div id="fullContainer" class="container scrollbar">
       <div id="topRow" class="container">
         <div class="noshrink" id="paddingCell"></div>
-        <template is="dom-repeat" items="[[_runs]]" as="run">
+        <template is="dom-repeat" items="[[_runs]]" sort="_sort" as="run">
         <div class="run-name-cell noshrink">
           <span>[[run]]</span>
         </div>
@@ -2345,7 +2345,7 @@ var TF;
             <div class="tag-name-cell noshrink">
               <span class="tag-name">[[tag]]</span>
             </div>
-            <template is="dom-repeat" items="[[_runs]]" as="run">
+            <template is="dom-repeat" items="[[_runs]]" sort="_sort" as="run">
               <div class="image-cell noshrink">
                 <template is="dom-if" if="[[_exists(run, tag, runToImages.*)]]">
                   <tf-image-loader id="loader" run="[[run]]" tag="[[tag]]" images-generator="[[imagesGenerator]]" individual-image-generator="[[individualImageGenerator]]">
diff --git a/tensorflow/tensorflow.bzl b/tensorflow/tensorflow.bzl
index 76713d9a15..8cb75f4a04 100644
--- a/tensorflow/tensorflow.bzl
+++ b/tensorflow/tensorflow.bzl
@@ -252,6 +252,7 @@ def _py_wrap_cc_impl(ctx):
     cc_include_dirs += [h.dirname for h in dep.cc.transitive_headers]
     cc_includes += dep.cc.transitive_headers
   args += ["-I" + x for x in cc_include_dirs]
+  args += ["-I" + ctx.label.workspace_root]
   args += ["-o", cc_out.path]
   args += ["-outdir", py_out.dirname]
   args += [src.path]
diff --git a/tensorflow/tools/ci_build/builds/android.sh b/tensorflow/tools/ci_build/builds/android.sh
index 998090d529..73ed5ea218 100755
--- a/tensorflow/tools/ci_build/builds/android.sh
+++ b/tensorflow/tools/ci_build/builds/android.sh
@@ -22,6 +22,8 @@ model_file_name="inception5h.zip"
 tmp_model_file_name="${HOME}/.cache/tensorflow_models/${model_file_name}"
 mkdir -p $(dirname ${tmp_model_file_name})
 [ -e "${tmp_model_file_name}" ] || wget -c "https://storage.googleapis.com/download.tensorflow.org/models/${model_file_name}" -O "${tmp_model_file_name}"
+# We clean up after ourselves, but not if we exit with an error, so make sure we start clean
+rm -rf tensorflow/examples/android/assets/
 unzip -o "${tmp_model_file_name}" -d tensorflow/examples/android/assets/
 
 # Modify the WORKSPACE file.
diff --git a/tensorflow/tools/ci_build/builds/configured b/tensorflow/tools/ci_build/builds/configured
index a84ded631c..d452eac65e 100755
--- a/tensorflow/tools/ci_build/builds/configured
+++ b/tensorflow/tools/ci_build/builds/configured
@@ -24,7 +24,8 @@ CONTAINER_TYPE=$( echo "$1" | tr '[:upper:]' '[:lower:]' )
 shift 1
 COMMAND=("$@")
 
-export PYTHON_BIN_PATH="${PYTHON_BIN_PATH:-$(which python)}"
+export CI_BUILD_PYTHON="${CI_BUILD_PYTHON:-python}"
+export PYTHON_BIN_PATH="${PYTHON_BIN_PATH:-$(which ${CI_BUILD_PYTHON})}"
 if [ "${CONTAINER_TYPE}" == "gpu" ]; then
   export TF_NEED_CUDA=1
 else
diff --git a/tensorflow/tools/ci_build/builds/pip.sh b/tensorflow/tools/ci_build/builds/pip.sh
index f51450bf6d..66ebf13baa 100755
--- a/tensorflow/tools/ci_build/builds/pip.sh
+++ b/tensorflow/tools/ci_build/builds/pip.sh
@@ -18,8 +18,7 @@
 # and run the Python unit tests from the source code on the installation
 #
 # Usage:
-#   pip.sh CONTAINER_TYPE [--pip-upgrade]
-# The option "--pip-upgrade" forces "--upgrade" flag during pip install.
+#   pip.sh CONTAINER_TYPE
 #
 # When executing the Python unit tests, the script obeys the shell
 # variables: PY_TEST_WHITELIST, PY_TEST_BLACKLIST, PY_TEST_GPU_BLACKLIST,
@@ -70,17 +69,22 @@ abs_path() {
     [[ $1 = /* ]] && echo "$1" || echo "$PWD/${1#./}"
 }
 
+# Exit after a failure
+die() {
+    echo $@
+    exit 1
+}
+
 # Get the command line arguments
 CONTAINER_TYPE=$( echo "$1" | tr '[:upper:]' '[:lower:]' )
 
 PIP_BUILD_TARGET="//tensorflow/tools/pip_package:build_pip_package"
 if [[ ${CONTAINER_TYPE} == "cpu" ]]; then
-  bazel build -c opt ${PIP_BUILD_TARGET}
+  bazel build -c opt ${PIP_BUILD_TARGET} || die "Build failed."
 elif [[ ${CONTAINER_TYPE} == "gpu" ]]; then
-  bazel build -c opt --config=cuda ${PIP_BUILD_TARGET}
+  bazel build -c opt --config=cuda ${PIP_BUILD_TARGET} || die "Build failed."
 else
-  echo "Unrecognized container type: \"${CONTAINER_TYPE}\""
-  exit 1
+  die "Unrecognized container type: \"${CONTAINER_TYPE}\""
 fi
 
 echo "PY_TEST_WHITELIST: ${PY_TEST_WHITELIST}"
@@ -92,6 +96,22 @@ if [[ ${CONTAINER_TYPE} == "gpu" ]]; then
   PY_TEST_BLACKLIST="${PY_TEST_BLACKLIST}:${PY_TEST_GPU_BLACKLIST}"
 fi
 
+# Obtain the path to Python binary
+source tools/python_bin_path.sh
+
+# Assume: PYTHON_BIN_PATH is exported by the script above
+if [[ -z "$PYTHON_BIN_PATH" ]]; then
+  die "PYTHON_BIN_PATH was not provided. Did you run configure?"
+fi
+
+# Determine the major and minor versions of Python being used (e.g., 2.7)
+# This info will be useful for determining the directory of the local pip
+# installation of Python
+PY_MAJOR_MINOR_VER=$(${PYTHON_BIN_PATH} -V 2>&1 | awk '{print $NF}' | cut -d. -f-2)
+
+echo "Python binary path to be used in PIP install-test: ${PYTHON_BIN_PATH} "\
+"(Major.Minor version: ${PY_MAJOR_MINOR_VER})"
+
 # Build PIP Wheel file
 PIP_WHL_DIR="pip_test/whl"
 PIP_WHL_DIR=`abs_path ${PIP_WHL_DIR}`  # Get absolute path
@@ -101,9 +121,8 @@ bazel-bin/tensorflow/tools/pip_package/build_pip_package ${PIP_WHL_DIR} &&
 # Perform installation
 WHL_PATH=`ls ${PIP_WHL_DIR}/tensorflow*.whl`
 if [[ `echo ${WHL_PATH} | wc -w` -ne 1 ]]; then
-  echo "ERROR: Failed to find exactly one built TensorFlow .whl file in "\
+  die "ERROR: Failed to find exactly one built TensorFlow .whl file in "\
 "directory: ${PIP_WHL_DIR}"
-  exit 1
 fi
 
 echo "whl file path = ${WHL_PATH}"
@@ -111,12 +130,10 @@ echo "whl file path = ${WHL_PATH}"
 # Install, in user's local home folder
 echo "Installing pip whl file: ${WHL_PATH}"
 
-UPGRADE_OPT=""
-if [[ $2 == "--pip-upgrade" ]]; then
-  UPGRADE_OPT="--upgrade"
-fi
-
-pip install -v --user ${UPGRADE_OPT} ${WHL_PATH} &&
+# Call pip install twice, first time with --upgrade and second time without it
+# This addresses the sporadic test failures related to protobuf version
+${PYTHON_BIN_PATH} -m pip install -v --user --upgrade ${WHL_PATH} numpy==1.8.2 &&
+${PYTHON_BIN_PATH} -m pip install -v --user ${WHL_PATH} &&
 
 # If NO_TEST_ON_INSTALL is set to any non-empty value, skip all Python
 # tests-on-install and exit right away
@@ -144,8 +161,8 @@ mkdir ${PY_TEST_LOG_DIR}
 LIB_PYTHON_DIR=""
 
 # Candidate locations of the local Python library directory
-LIB_PYTHON_DIR_CANDS="${HOME}/.local/lib/python* "\
-"${HOME}/Library/Python/*/lib/python"
+LIB_PYTHON_DIR_CANDS="${HOME}/.local/lib/python${PY_MAJOR_MINOR_VER}* "\
+"${HOME}/Library/Python/${PY_MAJOR_MINOR_VER}*/lib/python"
 
 for CAND in ${LIB_PYTHON_DIR_CANDS}; do
   if [[ -d "${CAND}" ]]; then
@@ -155,8 +172,7 @@ for CAND in ${LIB_PYTHON_DIR_CANDS}; do
 done
 
 if [[ -z ${LIB_PYTHON_DIR} ]]; then
-  echo "Failed to find local Python library directory"
-  exit 1
+  die "Failed to find local Python library directory"
 else
   echo "Found local Python library directory at: ${LIB_PYTHON_DIR}"
 fi
@@ -185,11 +201,12 @@ cp -r tensorflow/core/lib/png ${PY_TEST_DIR}/tensorflow/core/lib
 # Run tests
 DIR0=`pwd`
 ALL_PY_TESTS=`find tensorflow/python -name "*_test.py"`
+# TODO(cais): Add tests in tensorflow/contrib
+
 PY_TEST_COUNT=`echo ${ALL_PY_TESTS} | wc -w`
 
 if [[ ${PY_TEST_COUNT} -eq 0 ]]; then
-  echo "ERROR: Cannot find any tensorflow Python unit tests to run on install"
-  exit 1
+  die "ERROR: Cannot find any tensorflow Python unit tests to run on install"
 fi
 
 # Iterate through all the Python unit test files using the installation
@@ -237,7 +254,7 @@ for TEST_FILE_PATH in ${ALL_PY_TESTS}; do
   # avoid the possibility of picking up dependencies from the
   # source directory
   cd ${PY_TEST_DIR}
-  python ${PY_TEST_DIR}/${TEST_BASENAME} >${TEST_LOG} 2>&1
+  ${PYTHON_BIN_PATH} ${PY_TEST_DIR}/${TEST_BASENAME} >${TEST_LOG} 2>&1
 
   # Check for pass or failure status of the test outtput and exit
   if [[ $? -eq 0 ]]; then
diff --git a/tensorflow/tools/ci_build/builds/print_build_info.sh b/tensorflow/tools/ci_build/builds/print_build_info.sh
index 05d989b0f6..f243c185c0 100755
--- a/tensorflow/tools/ci_build/builds/print_build_info.sh
+++ b/tensorflow/tools/ci_build/builds/print_build_info.sh
@@ -31,43 +31,58 @@ shift 1
 COMMAND=("$@")
 
 # Information about machine and OS
-OS=`uname`
-KERNEL=`uname -r`
+OS=$(uname)
+KERNEL=$(uname -r)
 
-ARCH=`uname -p`
-PROCESSOR=`grep "model name" /proc/cpuinfo | head -1 | awk '{print substr($0, index($0, $4))}'`
-PROCESSOR_COUNT=`grep "model name" /proc/cpuinfo | wc -l`
+ARCH=$(uname -p)
+PROCESSOR=$(grep "model name" /proc/cpuinfo | head -1 | awk '{print substr($0, index($0, $4))}')
+PROCESSOR_COUNT=$(grep "model name" /proc/cpuinfo | wc -l)
 
-MEM_TOTAL=`grep MemTotal /proc/meminfo | awk '{print $2, $3}'`
-SWAP_TOTAL=`grep SwapTotal /proc/meminfo | awk '{print $2, $3}'`
+MEM_TOTAL=$(grep MemTotal /proc/meminfo | awk '{print $2, $3}')
+SWAP_TOTAL=$(grep SwapTotal /proc/meminfo | awk '{print $2, $3}')
 
 # Information about build tools
-BAZEL_VER=`bazel version | head -1`
-JAVA_VER=`javac -version 2>&1 | awk '{print $2}'`
-PYTHON_VER=`python -V 2>&1 | awk '{print $2}'`
-GPP_VER=`g++ --version | head -1`
-SWIG_VER=`swig -version | grep -m 1 . | awk '{print $3}'`
+if [[ ! -z $(which bazel) ]]; then
+  BAZEL_VER=$(bazel version | head -1)
+fi
+
+if [[ ! -z $(which javac) ]]; then
+  JAVA_VER=$(javac -version 2>&1 | awk '{print $2}')
+fi
+
+if [[ ! -z $(which python) ]]; then
+  PYTHON_VER=$(python -V 2>&1 | awk '{print $2}')
+fi
+
+if [[ ! -z $(which g++) ]]; then
+  GPP_VER=$(g++ --version | head -1)
+fi
+
+if [[ ! -z $(which swig) ]]; then
+  SWIG_VER=$(swig -version > /dev/null | grep -m 1 . | awk '{print $3}')
+fi
 
 # Information about TensorFlow source
-TF_FETCH_URL=`git remote show origin | grep "Fetch URL:" | awk '{print $3}'`
-TF_HEAD=`git rev-parse HEAD`
+TF_FETCH_URL=$(git remote show origin | grep "Fetch URL:" | awk '{print $3}')
+TF_HEAD=$(git rev-parse HEAD)
 
 # NVIDIA & CUDA info
 NVIDIA_DRIVER_VER=""
 if [[ -f /proc/driver/nvidia/version ]]; then
-  NVIDIA_DRIVER_VER=`head -1 /proc/driver/nvidia/version | awk '{print $(NF-6)}'`
+  NVIDIA_DRIVER_VER=$(head -1 /proc/driver/nvidia/version | awk '{print $(NF-6)}')
 fi
 
 CUDA_DEVICE_COUNT="0"
 CUDA_DEVICE_NAMES=""
-if [[ ! -z `which nvidia-debugdump` ]]; then
-  CUDA_DEVICE_COUNT=`nvidia-debugdump -l | grep "^Found [0-9]*.*device.*" | awk '{print $2}'`
-  CUDA_DEVICE_NAMES=`nvidia-debugdump -l | grep "Device name:.*" | awk '{print substr($0, index($0, $3)) ","}'`
+if [[ ! -z $(which nvidia-debugdump) ]]; then
+  CUDA_DEVICE_COUNT=$(nvidia-debugdump -l | grep "^Found [0-9]*.*device.*" | awk '{print $2}')
+  CUDA_DEVICE_NAMES=$(nvidia-debugdump -l | grep "Device name:.*" | awk '{print substr($0, index($0,\
+ $3)) ","}')
 fi
 
 CUDA_TOOLKIT_VER=""
-if [[ ! -z 'which nvcc' ]]; then
-  CUDA_TOOLKIT_VER=`nvcc -V | grep release | awk '{print $(NF)}'`
+if [[ ! -z $(which nvcc) ]]; then
+  CUDA_TOOLKIT_VER=$(nvcc -V | grep release | awk '{print $(NF)}')
 fi
 
 # Print info
@@ -87,6 +102,7 @@ echo "TF_BUILD_INFO = {"\
 "Java_version: \"${JAVA_VER}\", "\
 "Python_version: \"${PYTHON_VER}\", "\
 "gpp_version: \"${GPP_VER}\", "\
+"swig_version: \"${SWIG_VER}\", "\
 "NVIDIA_driver_version: \"${NVIDIA_DRIVER_VER}\", "\
 "CUDA_device_count: \"${CUDA_DEVICE_COUNT}\", "\
 "CUDA_device_names: \"${CUDA_DEVICE_NAMES}\", "\
diff --git a/tensorflow/tools/ci_build/builds/with_the_same_user b/tensorflow/tools/ci_build/builds/with_the_same_user
index 7867ae4942..bab6f14c10 100755
--- a/tensorflow/tools/ci_build/builds/with_the_same_user
+++ b/tensorflow/tools/ci_build/builds/with_the_same_user
@@ -31,8 +31,9 @@ getent group "${CI_BUILD_GID}" || addgroup --gid "${CI_BUILD_GID}" "${CI_BUILD_G
 getent passwd "${CI_BUILD_UID}" || adduser --gid "${CI_BUILD_GID}" --uid "${CI_BUILD_UID}" \
     --gecos "${CI_BUILD_USER} (generated by with_the_same_user script)" \
     --disabled-password --home "${CI_BUILD_HOME}" --quiet "${CI_BUILD_USER}"
+sudo usermod -a -G sudo "${CI_BUILD_USER}"
 
 cp /root/.bazelrc "${CI_BUILD_HOME}/.bazelrc"
 chown "${CI_BUILD_UID}:${CI_BUILD_GID}" "${CI_BUILD_HOME}/.bazelrc"
 
-sudo -u "#${CI_BUILD_UID}" --preserve-env -H ${COMMAND[@]}
+sudo -u "#${CI_BUILD_UID}" --preserve-env "HOME=${CI_BUILD_HOME}" ${COMMAND[@]}
diff --git a/tensorflow/tools/ci_build/ci_parameterized_build.sh b/tensorflow/tools/ci_build/ci_parameterized_build.sh
index 49e6a8347a..97b25f32a0 100755
--- a/tensorflow/tools/ci_build/ci_parameterized_build.sh
+++ b/tensorflow/tools/ci_build/ci_parameterized_build.sh
@@ -41,6 +41,14 @@
 #   TF_BUILD_BAZEL_TARGET:
 #                      Used to override the default bazel build target:
 #                      //tensorflow/...
+#   TF_BUILD_BAZEL_CLEAN:
+#                      Will perform "bazel clean", if and only if this variable
+#                      is set to any non-empty and non-0 value
+#   TF_BUILD_SERIAL_TESTS:
+#                      Build parallely, but test serially
+#                      (i.e., bazel test --job=1), potentially useful for
+#                      builds where the tests cannot be run in parallel due to
+#                      resource contention (e.g., for GPU builds)
 #
 # This script can be used by Jenkins parameterized / matrix builds.
 
@@ -65,15 +73,29 @@ DOCKER_MAIN_CMD="${CI_BUILD_DIR}/ci_build.sh"
 NO_DOCKER_MAIN_CMD="${CI_BUILD_DIR}/builds/configured"
 
 # Additional option flags to apply when Docker is unavailable (e.g., on Mac)
-NO_DOCKER_OPT_FLAG="--linkopt=-headerpad_max_install_names"
+NO_DOCKER_OPT_FLAG="--linkopt=-headerpad_max_install_names "\
+"--genrule_strategy=standalone"
+
+DO_DOCKER=1
 
 BAZEL_CMD="bazel test"
+BAZEL_BUILD_ONLY_CMD="bazel build"
+BAZEL_CLEAN_CMD="bazel clean"
+BAZEL_SERIAL_FLAG="--jobs=1"
+
 PIP_CMD="${CI_BUILD_DIR}/builds/pip.sh"
 ANDROID_CMD="${CI_BUILD_DIR}/builds/android.sh"
 
 BAZEL_TARGET="//tensorflow/..."
+
+
+
 ##########################################################
 
+echo "Parameterized build starts at: $(date)"
+echo ""
+START_TIME=$(date +'%s')
+
 # Convert all the required environment variables to lower case
 TF_BUILD_CONTAINER_TYPE=$(to_lower ${TF_BUILD_CONTAINER_TYPE})
 TF_BUILD_PYTHON_VERSION=$(to_lower ${TF_BUILD_PYTHON_VERSION})
@@ -92,6 +114,8 @@ echo "  TF_BUILD_APPEND_CI_DOCKER_EXTRA_PARAMS="\
 "${TF_BUILD_APPEND_CI_DOCKER_EXTRA_PARAMS}"
 echo "  TF_BUILD_APPEND_ARGUMENTS=${TF_BUILD_APPEND_ARGUMENTS}"
 echo "  TF_BUILD_BAZEL_TARGET=${TF_BUILD_BAZEL_TARGET}"
+echo "  TF_BUILD_BAZEL_CLEAN=${TF_BUILD_BAZEL_CLEAN}"
+echo "  TF_BUILD_SERIAL_TESTS=${TF_BUILD_SERIAL_TESTS}"
 
 # Process container type
 CTYPE=${TF_BUILD_CONTAINER_TYPE}
@@ -111,13 +135,14 @@ fi
 EXTRA_PARAMS=""
 
 # Determine if Docker is available
-MAIN_CMD=${DOCKER_MAIN_CMD}
 if [[ -z "$(which docker)" ]]; then
+  DO_DOCKER=0
+
   echo "It appears that Docker is not available on this system. "\
 "Will perform build without Docker."
-  echo "In addition, the additional option flags will be applied to the build:"
+  echo "Also, the additional option flags will be applied to the build:"
   echo "  ${NO_DOCKER_OPT_FLAG}"
-  MAIN_CMD=${NO_DOCKER_MAIN_CMD}
+  MAIN_CMD="${NO_DOCKER_MAIN_CMD} ${CTYPE}"
   OPT_FLAG="${OPT_FLAG} ${NO_DOCKER_OPT_FLAG}"
 
 fi
@@ -150,10 +175,27 @@ if [[ ${TF_BUILD_IS_PIP} == "no_pip" ]]; then
 
   if [[ ${CTYPE} == "cpu" ]] || [[ ${CTYPE} == "gpu" ]]; then
     # Run Bazel
-    MAIN_CMD="${MAIN_CMD} ${CTYPE} ${BAZEL_CMD} ${OPT_FLAG} "\
+    MAIN_CMD="${MAIN_CMD} ${BAZEL_CMD} ${OPT_FLAG} "\
 "${TF_BUILD_APPEND_ARGUMENTS} ${BAZEL_TARGET}"
+    MAIN_CMD=$(str_strip "${MAIN_CMD}")
+
+    if [[ ! -z "${TF_BUILD_SERIAL_TESTS}" ]] &&
+       [[ "${TF_BUILD_SERIAL_TESTS}" != "0" ]]; then
+      # Break the operation into two steps: build and test
+      # The 1st (build) step will be done in parallel, as default
+      # But the 2nd (test) step will be done serially.
+
+      BUILD_ONLY_CMD="${BAZEL_BUILD_ONLY_CMD} ${OPT_FLAG} "\
+"${TF_BUILD_APPEND_ARGUMENTS} ${BAZEL_TARGET}"
+      echo "Build-only command: ${BUILD_ONLY_CMD}"
+
+      MAIN_CMD="${BUILD_ONLY_CMD} && "\
+"${BAZEL_CMD} ${OPT_FLAG} ${BAZEL_SERIAL_FLAG} "\
+"${TF_BUILD_APPEND_ARGUMENTS} ${BAZEL_TARGET}"
+      echo "Parallel-build + serial-test command: ${MAIN_CMD}"
+    fi
   elif [[ ${CTYPE} == "android" ]]; then
-    MAIN_CMD="${MAIN_CMD} ${CTYPE} ${ANDROID_CMD} ${OPT_FLAG} "
+    MAIN_CMD="${ANDROID_CMD} ${OPT_FLAG} "
   fi
 elif [[ ${TF_BUILD_IS_PIP} == "pip" ]]; then
   # Android builds conflict with PIP builds
@@ -163,7 +205,7 @@ elif [[ ${TF_BUILD_IS_PIP} == "pip" ]]; then
     exit 0
   fi
 
-  MAIN_CMD="${MAIN_CMD} ${CTYPE} ${PIP_CMD} ${CTYPE} "\
+  MAIN_CMD="${MAIN_CMD} ${PIP_CMD} ${CTYPE} "\
 "${TF_BUILD_APPEND_ARGUMENTS}"
 else
   echo "Unrecognized value in TF_BUILD_IS_PIP: \"${TF_BUILD_IS_PIP}\""
@@ -174,7 +216,22 @@ fi
 if [[ ${TF_BUILD_PYTHON_VERSION} == "python2" ]]; then
   :
 elif [[ ${TF_BUILD_PYTHON_VERSION} == "python3" ]]; then
-  EXTRA_PARAMS="${EXTRA_PARAMS} -e PYTHON_BIN_PATH=/usr/bin/python3"
+  # Supply proper environment variable to select Python 3
+  if [[ "${DO_DOCKER}" == "1" ]]; then
+    EXTRA_PARAMS="${EXTRA_PARAMS} -e CI_BUILD_PYTHON=python3"
+  else
+    # Determine the path to python3
+    PYTHON3_PATH=$(which python3 | head -1)
+    if [[ -z "${PYTHON3_PATH}" ]]; then
+      echo "ERROR: Failed to locate python3 binary on the system"
+      exit 1
+    else
+      echo "Found python3 binary at: ${PYTHON3_PATH}"
+    fi
+
+    export PYTHON_BIN_PATH="${PYTHON3_PATH}"
+  fi
+
 else
   echo "Unrecognized value in TF_BUILD_PYTHON_VERSION: "\
 "\"${TF_BUILD_PYTHON_VERSION}\""
@@ -184,17 +241,58 @@ fi
 # Append additional Docker extra parameters
 EXTRA_PARAMS="${EXTRA_PARAMS} ${TF_BUILD_APPEND_CI_DOCKER_EXTRA_PARAMS}"
 
-# Strip leading and trailing whitespaces
-EXTRA_PARAMS=$(str_strip "${EXTRA_PARAMS}")
-
 # Finally, do a dry run or call the command
-echo "Final command assembled by parameterized build: "
-echo "CI_DOCKER_EXTRA_PARAMS=\"${EXTRA_PARAMS}\" ${MAIN_CMD}"
+
+# The command, which may consist of multiple parts (e.g., in the case of
+# TF_BUILD_SERIAL_TESTS=1), are written to a bash script, which is
+# then called. The name of the script is randomized to make concurrent
+# builds on the node possible.
+RAND_STR=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 8 | head -n 1)
+TMP_SCRIPT=/tmp/ci_parameterized_build_${RAND_STR}.sh
+
+if [[ "${DO_DOCKER}" == "1" ]]; then
+  # Map the tmp script into the Docker container
+  EXTRA_PARAMS="${EXTRA_PARAMS} -v ${TMP_SCRIPT}:/tmp/tf_build.sh"
+  EXTRA_PARAMS=$(str_strip "${EXTRA_PARAMS}")
+
+  echo "Exporting CI_DOCKER_EXTRA_PARAMS: ${EXTRA_PARAMS}"
+  export CI_DOCKER_EXTRA_PARAMS="${EXTRA_PARAMS}"
+fi
+
+# Write to the tmp script
+echo "#!/bin/bash" > ${TMP_SCRIPT}
+if [[ ! -z "${TF_BUILD_BAZEL_CLEAN}" ]] &&
+   [[ "${TF_BUILD_BAZEL_CLEAN}" != "0" ]]; then
+  echo ${BAZEL_CLEAN_CMD} >> ${TMP_SCRIPT}
+fi
+echo ${MAIN_CMD} >> ${TMP_SCRIPT}
+
+echo "Executing final command (${TMP_SCRIPT})..."
+echo "=========================================="
+cat ${TMP_SCRIPT}
+echo "=========================================="
+echo ""
+
+chmod +x ${TMP_SCRIPT}
+
 if [[ ! -z "${TF_BUILD_DRY_RUN}" ]] && [[ ${TF_BUILD_DRY_RUN} != "0" ]]; then
   # Do a dry run: just print the final command
   echo "*** This is a DRY RUN ***"
 else
-  # Call the command
-  echo "Executing final command..."
-  CI_DOCKER_EXTRA_PARAMS="${EXTRA_PARAMS}" ${MAIN_CMD}
-fi
+  # Actually run the command
+  if [[ "${DO_DOCKER}" == "1" ]]; then
+    ${DOCKER_MAIN_CMD} ${CTYPE} /tmp/tf_build.sh
+  else
+    ${TMP_SCRIPT}
+  fi
+fi && FAILURE=0 || FAILURE=1
+[[ ${FAILURE} == "0" ]] && RESULT="SUCCESS" || RESULT="FAILURE"
+
+rm -f ${TMP_SCRIPT}
+
+END_TIME=$(date +'%s')
+echo ""
+echo "Parameterized build ends with ${RESULT} at: $(date) "\
+"(Elapsed time: $((${END_TIME} - ${START_TIME})) s)"
+
+exit ${FAILURE}
diff --git a/tensorflow/tools/docker/Dockerfile b/tensorflow/tools/docker/Dockerfile
index 552d974c52..69e502d098 100644
--- a/tensorflow/tools/docker/Dockerfile
+++ b/tensorflow/tools/docker/Dockerfile
@@ -28,9 +28,9 @@ RUN pip --no-cache-dir install \
     python -m ipykernel.kernelspec
 
 # Install TensorFlow CPU version.
-ENV TENSORFLOW_VERSION 0.6.0
+ENV TENSORFLOW_VERSION 0.7.0
 RUN pip --no-cache-dir install \
-    https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-${TENSORFLOW_VERSION}-cp27-none-linux_x86_64.whl
+    http://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-${TENSORFLOW_VERSION}-py2-none-linux_x86_64.whl
 
 # Set up our notebook config.
 COPY jupyter_notebook_config.py /root/.jupyter/
diff --git a/tensorflow/tools/docker/Dockerfile.devel b/tensorflow/tools/docker/Dockerfile.devel
index d361135e8a..1a30c7d700 100644
--- a/tensorflow/tools/docker/Dockerfile.devel
+++ b/tensorflow/tools/docker/Dockerfile.devel
@@ -79,7 +79,7 @@ RUN mkdir /bazel && \
 
 RUN git clone --recursive https://github.com/tensorflow/tensorflow.git && \
     cd tensorflow && \
-    git checkout 0.6.0
+    git checkout r0.7
 WORKDIR /tensorflow
 
 # TODO(craigcitro): Don't install the pip package, since it makes it
diff --git a/tensorflow/tools/docker/Dockerfile.devel-gpu b/tensorflow/tools/docker/Dockerfile.devel-gpu
index b0e7cb8d06..56de5940ab 100644
--- a/tensorflow/tools/docker/Dockerfile.devel-gpu
+++ b/tensorflow/tools/docker/Dockerfile.devel-gpu
@@ -79,7 +79,7 @@ RUN mkdir /bazel && \
 
 RUN git clone --recursive https://github.com/tensorflow/tensorflow.git && \
     cd tensorflow && \
-    git checkout 0.6.0
+    git checkout r0.7
 WORKDIR /tensorflow
 
 # Configure the build for our CUDA configuration.
diff --git a/tensorflow/tools/docker/Dockerfile.gpu b/tensorflow/tools/docker/Dockerfile.gpu
index b4e46bb0ef..77699ebb42 100644
--- a/tensorflow/tools/docker/Dockerfile.gpu
+++ b/tensorflow/tools/docker/Dockerfile.gpu
@@ -28,9 +28,9 @@ RUN pip --no-cache-dir install \
     python -m ipykernel.kernelspec
 
 # Install TensorFlow GPU version.
-ENV TENSORFLOW_VERSION 0.6.0
+ENV TENSORFLOW_VERSION 0.7.0
 RUN pip --no-cache-dir install \
-    https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-${TENSORFLOW_VERSION}-cp27-none-linux_x86_64.whl
+    http://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-${TENSORFLOW_VERSION}-py2-none-linux_x86_64.whl
 
 # Set up our notebook config.
 COPY jupyter_notebook_config.py /root/.jupyter/
diff --git a/tensorflow/tools/docs/gen_cc_md.py b/tensorflow/tools/docs/gen_cc_md.py
index 433ad1ccbb..8ad3eb80d2 100644
--- a/tensorflow/tools/docs/gen_cc_md.py
+++ b/tensorflow/tools/docs/gen_cc_md.py
@@ -29,17 +29,13 @@ from tensorflow.python import flags
 
 ANCHOR_RE = re.compile(r'\W+')
 
-PAGE_TEMPLATE = '''# {0} `{1}`
+PAGE_TEMPLATE = '''# `{0} {1}`
 
 {2}
 
-##Member Summary
+###Member Details
 
-{3}
-
-##Member Details
-
-{4}'''
+{3}'''
 
 INDEX_TEMPLATE = '''# TensorFlow C++ Session API reference documentation
 
@@ -235,13 +231,6 @@ def index_page(pages):
       all_md_files.append(pages[page_index].get_md_filename())
       pages.pop(page_index)
 
-  # Footer
-  lines.append('''
-
-<div class='sections-order' style="display: none;">
-<!--''')
-  lines.extend([('<!-- %s -->' % f) for f in all_md_files])
-  lines.extend(['-->', '</div>'])
   return '\n'.join(lines)
 
 
@@ -270,7 +259,7 @@ class Page(object):
     fulls = all_fulls(members)
     self.overview = page_overview(soup.find('compounddef'))
     self.page_text = PAGE_TEMPLATE.format(
-        self.type, self.name, self.overview, briefs, fulls)
+        self.type, self.name, self.overview, fulls)
 
   def get_text(self):
     return self.page_text
@@ -298,9 +287,9 @@ def main(unused_argv):
     if len(fname) < 6: continue
     newpage = None
     if fname[0:5] == 'class':
-      newpage = Page(os.path.join(FLAGS.src_dir, fname), 'Class')
+      newpage = Page(os.path.join(FLAGS.src_dir, fname), 'class')
     elif fname[0:6] == 'struct':
-      newpage = Page(os.path.join(FLAGS.src_dir, fname), 'Struct')
+      newpage = Page(os.path.join(FLAGS.src_dir, fname), 'struct')
     if newpage is not None and page_in_name_list(newpage, all_pages):
       pages.append(newpage)
       md_filename = newpage.get_md_filename()
@@ -314,9 +303,4 @@ def main(unused_argv):
   return 0
 
 if __name__ == '__main__':
-  try:
-    argv = FLAGS(sys.argv)  # parse flags
-  except flags.FlagsError as e:
-    print('%s\\nUsage: %s ARGS\\n%s' % (e, sys.argv[0], FLAGS))
-    sys.exit(1)
-  main(argv)
+  main(sys.argv)
diff --git a/tensorflow/tools/docs/gen_docs.sh b/tensorflow/tools/docs/gen_docs.sh
index 7f86fc93ef..95c0092d4a 100755
--- a/tensorflow/tools/docs/gen_docs.sh
+++ b/tensorflow/tools/docs/gen_docs.sh
@@ -14,7 +14,7 @@
 # limitations under the License.
 # ==============================================================================
 
-# This script needs to be run from the tensorflow/tools directory
+# This script needs to be run from the tensorflow/tools/docs directory
 # Pass -a to also rebuild C++ docs. This requires doxygen.
 
 set -e
diff --git a/tensorflow/tools/docs/tf-doxy_for_md-config b/tensorflow/tools/docs/tf-doxy_for_md-config
index 69859e53fd..e2d0a44f18 100644
--- a/tensorflow/tools/docs/tf-doxy_for_md-config
+++ b/tensorflow/tools/docs/tf-doxy_for_md-config
@@ -733,7 +733,7 @@ WARN_LOGFILE           =
 # spaces.
 # Note: If this tag is empty the current directory is searched.
 
-INPUT                  = core/
+INPUT                  = core/framework core/lib/core core/platform core/public
 
 # This tag can be used to specify the character encoding of the source files
 # that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
diff --git a/tensorflow/tools/pip_package/build_pip_package.sh b/tensorflow/tools/pip_package/build_pip_package.sh
index af8b214763..090bd27830 100755
--- a/tensorflow/tools/pip_package/build_pip_package.sh
+++ b/tensorflow/tools/pip_package/build_pip_package.sh
@@ -33,8 +33,10 @@ function main() {
     exit 1
   fi
   cp -R \
-    bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/* \
+    bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/{tensorflow,external,google} \
     ${TMPDIR}
+  # TODO: We should have cleaner solution for this after 0.7 release
+  rm -rf ${TMPDIR}/external/eigen_archive
 
   cp tensorflow/tools/pip_package/MANIFEST.in ${TMPDIR}
   cp tensorflow/tools/pip_package/README ${TMPDIR}
diff --git a/tensorflow/tools/pip_package/setup.py b/tensorflow/tools/pip_package/setup.py
index f9f7a278fb..181e2f898e 100644
--- a/tensorflow/tools/pip_package/setup.py
+++ b/tensorflow/tools/pip_package/setup.py
@@ -26,7 +26,7 @@ from setuptools import find_packages, setup, Command, Extension
 from setuptools.command.install import install as InstallCommandBase
 from setuptools.dist import Distribution
 
-_VERSION = '0.6.0'
+_VERSION = '0.7.0'
 
 REQUIRED_PACKAGES = [
     'numpy >= 1.8.2',
@@ -43,7 +43,7 @@ else:
 
 # pylint: disable=line-too-long
 CONSOLE_SCRIPTS = [
-    'tensorboard = tensorflow.tensorboard.tensorboard:main',
+    'tensorboard = tensorflow.tensorboard.backend.tensorboard:main',
 ]
 # pylint: enable=line-too-long
 
@@ -157,7 +157,7 @@ setup(
     version=_VERSION,
     description='TensorFlow helps the tensors flow',
     long_description='',
-    url='http://tensorflow.com/',
+    url='http://tensorflow.org/',
     author='Google Inc.',
     author_email='opensource@google.com',
     # Contained modules and scripts.
diff --git a/tensorflow/user_ops/ackermann_test.py b/tensorflow/user_ops/ackermann_test.py
index 9464830979..bb757e69d1 100644
--- a/tensorflow/user_ops/ackermann_test.py
+++ b/tensorflow/user_ops/ackermann_test.py
@@ -33,7 +33,7 @@ class AckermannTest(tf.test.TestCase):
     self.assertEqual(ackermann.OP_LIST.op[0].name, 'Ackermann')
 
     with self.test_session():
-      self.assertEqual(ackermann.ackermann().eval(), 'A(m, 0) == A(m-1, 1)')
+      self.assertEqual(ackermann.ackermann().eval(), b'A(m, 0) == A(m-1, 1)')
 
 
 if __name__ == '__main__':
diff --git a/third_party/gpus/cuda/BUILD b/third_party/gpus/cuda/BUILD
index bc6320f487..fa29fde449 100644
--- a/third_party/gpus/cuda/BUILD
+++ b/third_party/gpus/cuda/BUILD
@@ -57,10 +57,10 @@ cc_library(
 cc_library(
     name = "cudart",
     srcs = [
-        "lib64/libcudart.so." + tf_get_cuda_version(),
+        "lib64/libcudart.so" + tf_get_cuda_version(),
     ],
     data = [
-        "lib64/libcudart.so." + tf_get_cuda_version(),
+        "lib64/libcudart.so" + tf_get_cuda_version(),
     ],
     includes = ["include/"],
     visibility = ["//visibility:public"],
@@ -70,10 +70,10 @@ cc_library(
 cc_library(
     name = "cublas",
     srcs = [
-        "lib64/libcublas.so." + tf_get_cuda_version(),
+        "lib64/libcublas.so" + tf_get_cuda_version(),
     ],
     data = [
-        "lib64/libcublas.so." + tf_get_cuda_version(),
+        "lib64/libcublas.so" + tf_get_cuda_version(),
     ],
     includes = ["include/"],
     visibility = ["//visibility:public"],
@@ -83,10 +83,10 @@ cc_library(
 cc_library(
     name = "cudnn",
     srcs = [
-        "lib64/libcudnn.so." + tf_get_cudnn_version(),
+        "lib64/libcudnn.so" + tf_get_cudnn_version(),
     ],
     data = [
-        "lib64/libcudnn.so." + tf_get_cudnn_version(),
+        "lib64/libcudnn.so" + tf_get_cudnn_version(),
     ],
     includes = ["include/"],
     visibility = ["//visibility:public"],
@@ -96,10 +96,10 @@ cc_library(
 cc_library(
     name = "cufft",
     srcs = [
-        "lib64/libcufft.so." + tf_get_cuda_version(),
+        "lib64/libcufft.so" + tf_get_cuda_version(),
     ],
     data = [
-        "lib64/libcufft.so." + tf_get_cuda_version(),
+        "lib64/libcufft.so" + tf_get_cuda_version(),
     ],
     includes = ["include/"],
     visibility = ["//visibility:public"],
@@ -134,10 +134,10 @@ genrule(
         "include/cublas.h",
         "include/cudnn.h",
         "lib64/libcudart_static.a",
-        "lib64/libcublas.so." + tf_get_cuda_version(),
-        "lib64/libcudnn.so." + tf_get_cudnn_version(),
-        "lib64/libcudart.so." + tf_get_cuda_version(),
-        "lib64/libcufft.so." + tf_get_cuda_version(),
+        "lib64/libcublas.so" + tf_get_cuda_version(),
+        "lib64/libcudnn.so" + tf_get_cudnn_version(),
+        "lib64/libcudart.so" + tf_get_cuda_version(),
+        "lib64/libcufft.so" + tf_get_cuda_version(),
     ],
     cmd = if_cuda(
         # Under cuda config, create all the symbolic links to the actual cuda files
@@ -151,10 +151,10 @@ genrule(
           "touch $(@D)/include/cublas.h",
           "touch $(@D)/include/cudnn.h",
           "touch $(@D)/lib64/libcudart_static.a",
-          "touch $(@D)/lib64/libcublas.so." + tf_get_cuda_version(),
-          "touch $(@D)/lib64/libcudnn.so." + tf_get_cudnn_version(),
-          "touch $(@D)/lib64/libcudart.so." + tf_get_cuda_version(),
-          "touch $(@D)/lib64/libcufft.so." + tf_get_cuda_version(),
+          "touch $(@D)/lib64/libcublas.so" + tf_get_cuda_version(),
+          "touch $(@D)/lib64/libcudnn.so" + tf_get_cudnn_version(),
+          "touch $(@D)/lib64/libcudart.so" + tf_get_cuda_version(),
+          "touch $(@D)/lib64/libcufft.so" + tf_get_cuda_version(),
             ]),
     ),
     local = 1,
diff --git a/third_party/gpus/cuda/cuda_config.sh b/third_party/gpus/cuda/cuda_config.sh
index 87c35349c0..21f36d4416 100755
--- a/third_party/gpus/cuda/cuda_config.sh
+++ b/third_party/gpus/cuda/cuda_config.sh
@@ -110,18 +110,18 @@ if [ "$CHECK_ONLY" == "1" ]; then
   CheckAndLinkToSrcTree CudaError include/cublas.h
   CheckAndLinkToSrcTree CudnnError include/cudnn.h
   CheckAndLinkToSrcTree CudaError lib64/libcudart_static.a
-  CheckAndLinkToSrcTree CudaError lib64/libcublas.so.$TF_CUDA_VERSION
-  CheckAndLinkToSrcTree CudnnError lib64/libcudnn.so.$TF_CUDNN_VERSION
-  CheckAndLinkToSrcTree CudaError lib64/libcudart.so.$TF_CUDA_VERSION
-  CheckAndLinkToSrcTree CudaError lib64/libcufft.so.$TF_CUDA_VERSION
+  CheckAndLinkToSrcTree CudaError lib64/libcublas.so$TF_CUDA_VERSION
+  CheckAndLinkToSrcTree CudnnError lib64/libcudnn.so$TF_CUDNN_VERSION
+  CheckAndLinkToSrcTree CudaError lib64/libcudart.so$TF_CUDA_VERSION
+  CheckAndLinkToSrcTree CudaError lib64/libcufft.so$TF_CUDA_VERSION
   exit 0
 fi
 
 # Actually configure the source tree for TensorFlow's canonical view of Cuda
 # libraries.
 
-if test ! -e ${CUDA_TOOLKIT_PATH}/lib64/libcudart.so.$TF_CUDA_VERSION; then
-  CudaError "cannot find ${CUDA_TOOLKIT_PATH}/lib64/libcudart.so.$TF_CUDA_VERSION"
+if test ! -e ${CUDA_TOOLKIT_PATH}/lib64/libcudart.so$TF_CUDA_VERSION; then
+  CudaError "cannot find ${CUDA_TOOLKIT_PATH}/lib64/libcudart.so$TF_CUDA_VERSION"
 fi
 
 if test ! -d ${CUDNN_INSTALL_PATH}; then
@@ -138,9 +138,9 @@ else
 fi
 
 # Locate libcudnn.so.${$TF_CUDNN_VERSION}
-if test -e ${CUDNN_INSTALL_PATH}/libcudnn.so.$TF_CUDNN_VERSION; then
+if test -e ${CUDNN_INSTALL_PATH}/libcudnn.so$TF_CUDNN_VERSION; then
   CUDNN_LIB_PATH=${CUDNN_INSTALL_PATH}
-elif test -e ${CUDNN_INSTALL_PATH}/lib64/libcudnn.so.$TF_CUDNN_VERSION; then
+elif test -e ${CUDNN_INSTALL_PATH}/lib64/libcudnn.so$TF_CUDNN_VERSION; then
   CUDNN_LIB_PATH=${CUDNN_INSTALL_PATH}/lib64
 else
   CudnnError "cannot find libcudnn.so.$TF_CUDNN_VERSION under: ${CUDNN_INSTALL_PATH}"
@@ -182,4 +182,4 @@ LinkAllFiles ${CUDA_TOOLKIT_PATH}/nvvm $OUTPUTDIR/third_party/gpus/cuda/nvvm ||
 
 # Set up symbolic link for cudnn
 ln -sf $CUDNN_HEADER_PATH/cudnn.h $OUTPUTDIR/third_party/gpus/cuda/include/cudnn.h || exit -1
-ln -sf $CUDNN_LIB_PATH/libcudnn.so.$TF_CUDNN_VERSION $OUTPUTDIR/third_party/gpus/cuda/lib64/libcudnn.so.$TF_CUDNN_VERSION || exit -1
+ln -sf $CUDNN_LIB_PATH/libcudnn.so$TF_CUDNN_VERSION $OUTPUTDIR/third_party/gpus/cuda/lib64/libcudnn.so$TF_CUDNN_VERSION || exit -1
author	Vijay Vasudevan <vrv@google.com>	2016-02-17 11:42:30 -0800
committer	TensorFlower Gardener <gardener@tensorflow.org>	2016-02-17 12:56:41 -0800
commit	fe056f0b5e52db86766761f5e6446a89c1aa3938 (patch)
tree	68bce0e257d181a3fa37f83c97fdff0fdad877fc
parent	19d632338f983e02dd0268b931e9cced03b74805 (diff)