Merge changes from github.

Change: 155209832
author: Dan Ringwalt <ringwalt@google.com> 2017-05-05 09:09:05 -0800
committer: TensorFlower Gardener <gardener@tensorflow.org> 2017-05-05 10:26:00 -0700
commit: 692fad20f913ffa2cb874a87578ecabb03cc4557 (patch)
tree: 172717f537c91b0d1ac0366731b4eb2093fb743b
parent: b329dd821e29e64c93b1b9bf38e61871c6cb53da (diff)
101 files changed, 2167 insertions, 1375 deletions
diff --git a/ISSUE_TEMPLATE.md b/ISSUE_TEMPLATE.md
index d0979e87f7..6f4c048ce8 100644
--- a/ISSUE_TEMPLATE.md
+++ b/ISSUE_TEMPLATE.md
@@ -1,33 +1,36 @@
-Please go to Stack Overflow for help and support. http://stackoverflow.com/questions/tagged/tensorflow
+Please go to Stack Overflow for help and support:
+
+http://stackoverflow.com/questions/tagged/tensorflow
+
 If you open a GitHub issue, here is our policy:
 
-1. It must be a bug or feature request.
+1. It must be a bug or a feature request.
 2. The form below must be filled out.
 
-**Here's why we have that policy**: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g. fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.
+**Here's why we have that policy**: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.
 
 ------------------------
 
-Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.
+### System information
+- **Have I written custom code (as opposed to using a stock example script provided in TensorFlow)**:
+- **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**:
+- **TensorFlow installed from (source or binary)**:
+- **TensorFlow version (use command below)**:
+- **Bazel version (if compiling from source)**:
+- **CUDA/cuDNN version**:
+- **GPU model and memory**:
+- **Exact command to reproduce**:
 
-### System Information
-- *Have I written custom code (as opposed to using a stock example script provided in TensorFlow)?*:
-- *OS Platform and Distribution (i.e. Linux Ubuntu 16.0)*:
-- *TensorFlow installed from (source or binary)?*:
-- *TensorFlow version* (use command below):
-- *Bazel version (if compiling from source)*:
-- *CUDA/cuDNN version*:
-- *GPU Model and Memory*:
-- *Exact command to reproduce*:
+You can collect some of this information using our environment capture script:
 
-You can collect some of this information using our environment capture script https://github.com/tensorflow/tensorflow/blob/master/tools/
-You can collect the TensorFlow version with
-```sh
-python -c "import tensorflow as tf; print (tf.GIT_VERSION, tf.VERSION)"
-```
+https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh
 
+You can obtain the TensorFlow version with
 
-### Describe the problem clearly
+python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
+
+### Describe the problem
+Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.
 
-### Source Code / Logs
-Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full-traceback. Large logs and files should be attached. Try to reproducible test-case code the bare-minimum necessary to generate the problem
+### Source code / logs
+Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.
diff --git a/README.md b/README.md
index 951e7c3b9f..2878dab260 100644
--- a/README.md
+++ b/README.md
@@ -26,7 +26,7 @@ guidelines](CONTRIBUTING.md).**
 
 **We use [GitHub issues](https://github.com/tensorflow/tensorflow/issues) for
 tracking requests and bugs, but please see
-[Community](tensorflow/docs_src/about/index.md#community) for general questions
+[Community](https://www.tensorflow.org/community/) for general questions
 and discussion.**
 
 ## Installation
@@ -34,13 +34,12 @@ and discussion.**
 
 People who are a little more adventurous can also try our nightly binaries:
 
-
-* Linux CPU-only: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.1.0rc2-cp27-none-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave)) / [Python 3.4](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.1.0rc2-cp34-cp34m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=cpu-slave/)) / [Python 3.5](https://ci.tensorflow.org/view/Nightly/job/nightly-python35-linux-cpu/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.1.0rc2-cp35-cp35m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-python35-linux-cpu/))
-* Linux GPU: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-linux/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.1.0rc2-cp27-none-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-linux/)) / [Python 3.4](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-linux/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.1.0rc2-cp34-cp34m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-linux/)) / [Python 3.5](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3.5,label=gpu-linux/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.1.0rc2-cp35-cp35m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3.5,label=gpu-linux/))
-* Mac CPU-only: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=mac-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.1.0rc2-py2-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=mac-slave/)) / [Python 3](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=mac-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.1.0rc2-py3-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=mac-slave/))
-* Mac GPU: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-mac/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.1.0rc2-py2-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-mac/)) / [Python 3](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-mac/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.1.0rc2-py3-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-mac/))
-* Windows CPU-only: [Python 3.5 64-bit](https://ci.tensorflow.org/view/Nightly/job/nightly-win/DEVICE=cpu,OS=windows/lastSuccessfulBuild/artifact/cmake_build/tf_python/dist/tensorflow-1.1.0rc2-cp35-cp35m-win_amd64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-win/DEVICE=cpu,OS=windows/))
-* Windows GPU: [Python 3.5 64-bit](https://ci.tensorflow.org/view/Nightly/job/nightly-win/DEVICE=gpu,OS=windows/lastSuccessfulBuild/artifact/cmake_build/tf_python/dist/tensorflow_gpu-1.1.0rc2-cp35-cp35m-win_amd64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-win/DEVICE=gpu,OS=windows/))
+* Linux CPU-only: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.1.0-cp27-none-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave)) / [Python 3.4](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.1.0-cp34-cp34m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=cpu-slave/)) / [Python 3.5](https://ci.tensorflow.org/view/Nightly/job/nightly-python35-linux-cpu/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.1.0-cp35-cp35m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-python35-linux-cpu/))
+* Linux GPU: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-linux/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.1.0-cp27-none-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-linux/)) / [Python 3.4](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-linux/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.1.0-cp34-cp34m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-linux/)) / [Python 3.5](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3.5,label=gpu-linux/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.1.0-cp35-cp35m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3.5,label=gpu-linux/))
+* Mac CPU-only: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=mac-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.1.0-py2-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=mac-slave/)) / [Python 3](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=mac-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.1.0-py3-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=mac-slave/))
+* Mac GPU: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-mac/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.1.0-py2-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-mac/)) / [Python 3](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-mac/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.1.0-py3-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-mac/))
+* Windows CPU-only: [Python 3.5 64-bit](https://ci.tensorflow.org/view/Nightly/job/nightly-win/DEVICE=cpu,OS=windows/lastSuccessfulBuild/artifact/cmake_build/tf_python/dist/tensorflow-1.1.0-cp35-cp35m-win_amd64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-win/DEVICE=cpu,OS=windows/))
+* Windows GPU: [Python 3.5 64-bit](https://ci.tensorflow.org/view/Nightly/job/nightly-win/DEVICE=gpu,OS=windows/lastSuccessfulBuild/artifact/cmake_build/tf_python/dist/tensorflow_gpu-1.1.0-cp35-cp35m-win_amd64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-win/DEVICE=gpu,OS=windows/))
 * Android: [demo APK](https://ci.tensorflow.org/view/Nightly/job/nightly-android/lastSuccessfulBuild/artifact/out/tensorflow_demo.apk), [native libs](http://ci.tensorflow.org/view/Nightly/job/nightly-android/lastSuccessfulBuild/artifact/out/native/)
 ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-android/))
 
diff --git a/RELEASE.md b/RELEASE.md
index fe6d052640..f078d336ab 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,6 +1,7 @@
 # Changes since the last release
 
 ## Major Features and Improvements
+* Added `tf.layers.conv3d_transpose` layer for spatio temporal deconvolution.
 * Added `tf.Session.make_callable()`, which provides a lower overhead means of running a similar step multiple times.
 * Added ibverbs-based RDMA support to contrib (courtesy @junshi15 from Yahoo).
 * `RNNCell` objects now subclass `tf.layers._Layer`.  The strictness described
diff --git a/configure b/configure
index d7dde98292..7f68a8f5d4 100755
--- a/configure
+++ b/configure
@@ -35,12 +35,9 @@ function is_windows() {
   fi
 }
 
-function sed_hyphen_i() {
-  if is_macos; then
-    sed -i '' "$@"
-  else
-    sed -i "$@"
-  fi
+function sed_in_place() {
+  sed -e $1 $2 > "$2.bak"
+  mv "$2.bak" $2
 }
 
 function write_to_bazelrc() {
@@ -170,7 +167,7 @@ function setup_python {
 rm -f .tf_configure.bazelrc
 touch .tf_configure.bazelrc
 touch .bazelrc
-sed_hyphen_i "/tf_configure/d" .bazelrc
+sed_in_place "/tf_configure/d" .bazelrc
 echo "import %workspace%/.tf_configure.bazelrc" >> .bazelrc
 
 # Delete any leftover BUILD files from the Makefile build, which would interfere
@@ -409,31 +406,6 @@ done
 export TF_CUDA_CLANG
 write_action_env_to_bazelrc "TF_CUDA_CLANG" "$TF_CUDA_CLANG"
 
-# Set up which gcc nvcc should use as the host compiler
-# No need to set this on Windows
-while [[ "$TF_CUDA_CLANG" != "1" ]] && ! is_windows && true; do
-  fromuser=""
-  if [ -z "$GCC_HOST_COMPILER_PATH" ]; then
-    default_gcc_host_compiler_path=$(which gcc || true)
-    read -p "Please specify which gcc should be used by nvcc as the host compiler. [Default is $default_gcc_host_compiler_path]: " GCC_HOST_COMPILER_PATH
-    fromuser="1"
-    if [ -z "$GCC_HOST_COMPILER_PATH" ]; then
-      GCC_HOST_COMPILER_PATH="$default_gcc_host_compiler_path"
-    fi
-  fi
-  if [ -e "$GCC_HOST_COMPILER_PATH" ]; then
-    export GCC_HOST_COMPILER_PATH
-    write_action_env_to_bazelrc "GCC_HOST_COMPILER_PATH" "$GCC_HOST_COMPILER_PATH"
-    break
-  fi
-  echo "Invalid gcc path. ${GCC_HOST_COMPILER_PATH} cannot be found" 1>&2
-  if [ -z "$fromuser" ]; then
-    exit 1
-  fi
-  GCC_HOST_COMPILER_PATH=""
-  # Retry
-done
-
 # Set up which clang we should use as the cuda / host compiler.
 while [[ "$TF_CUDA_CLANG" == "1" ]] && true; do
   fromuser=""
@@ -474,6 +446,11 @@ while true; do
       else
         default_cuda_path="$(cygpath -m "$CUDA_PATH")"
       fi
+    elif is_linux; then
+      # If the default doesn't exist, try an alternative default.
+      if [ ! -d $default_cuda_path ] && [ -d /opt/cuda ]; then
+        default_cuda_path=/opt/cuda
+      fi
     fi
     read -p "Please specify the location where CUDA $TF_CUDA_VERSION toolkit is installed. Refer to README.md for more details. [Default is $default_cuda_path]: " CUDA_TOOLKIT_PATH
     fromuser="1"
@@ -513,6 +490,35 @@ while true; do
   CUDA_TOOLKIT_PATH=""
 done
 
+# Set up which gcc nvcc should use as the host compiler
+# No need to set this on Windows
+while [[ "$TF_CUDA_CLANG" != "1" ]] && ! is_windows && true; do
+  fromuser=""
+  if [ -z "$GCC_HOST_COMPILER_PATH" ]; then
+    default_gcc_host_compiler_path=$(which gcc || true)
+    cuda_bin_symlink="$CUDA_TOOLKIT_PATH/bin/gcc"
+    if [ -L "$cuda_bin_symlink" ]; then
+      default_gcc_host_compiler_path=$(readlink $cuda_bin_symlink)
+    fi
+    read -p "Please specify which gcc should be used by nvcc as the host compiler. [Default is $default_gcc_host_compiler_path]: " GCC_HOST_COMPILER_PATH
+    fromuser="1"
+    if [ -z "$GCC_HOST_COMPILER_PATH" ]; then
+      GCC_HOST_COMPILER_PATH="$default_gcc_host_compiler_path"
+    fi
+  fi
+  if [ -e "$GCC_HOST_COMPILER_PATH" ]; then
+    export GCC_HOST_COMPILER_PATH
+    write_action_env_to_bazelrc "GCC_HOST_COMPILER_PATH" "$GCC_HOST_COMPILER_PATH"
+    break
+  fi
+  echo "Invalid gcc path. ${GCC_HOST_COMPILER_PATH} cannot be found" 1>&2
+  if [ -z "$fromuser" ]; then
+    exit 1
+  fi
+  GCC_HOST_COMPILER_PATH=""
+  # Retry
+done
+
 # Find out where the cuDNN library is installed
 while true; do
   # Configure the cuDNN version to use.
diff --git a/tensorflow/BUILD b/tensorflow/BUILD
index a2f7a9fb63..a36a57314d 100644
--- a/tensorflow/BUILD
+++ b/tensorflow/BUILD
@@ -255,6 +255,7 @@ filegroup(
         "//tensorflow/contrib/seq2seq:all_files",
         "//tensorflow/contrib/session_bundle:all_files",
         "//tensorflow/contrib/session_bundle/example:all_files",
+        "//tensorflow/contrib/signal:all_files",
         "//tensorflow/contrib/slim:all_files",
         "//tensorflow/contrib/slim/python/slim/data:all_files",
         "//tensorflow/contrib/slim/python/slim/nets:all_files",
diff --git a/tensorflow/contrib/batching/BUILD b/tensorflow/contrib/batching/BUILD
index 1f1e2db69b..b441906cd4 100644
--- a/tensorflow/contrib/batching/BUILD
+++ b/tensorflow/contrib/batching/BUILD
@@ -181,6 +181,7 @@ py_test(
     size = "small",
     srcs = ["python/ops/batch_ops_test.py"],
     srcs_version = "PY2AND3",
+    tags = ["nomac"],
     deps = [
         ":batch_py",
         "//tensorflow/python:framework_test_lib",
diff --git a/tensorflow/contrib/factorization/python/ops/gmm_ops.py b/tensorflow/contrib/factorization/python/ops/gmm_ops.py
index 42544ff884..b092eab316 100644
--- a/tensorflow/contrib/factorization/python/ops/gmm_ops.py
+++ b/tensorflow/contrib/factorization/python/ops/gmm_ops.py
@@ -85,7 +85,7 @@ def _init_clusters_random(data, num_clusters, random_seed):
         maxval=math_ops.cast(num_data, dtypes.int64),
         seed=random_seed,
         dtype=dtypes.int64)
-  indices = math_ops.cast(indices, dtypes.int32) % num_data
+  indices %= math_ops.cast(num_data, dtypes.int64)
   clusters_init = embedding_lookup(data, indices, partition_strategy='div')
   return clusters_init
 
diff --git a/tensorflow/contrib/grid_rnn/python/kernel_tests/grid_rnn_test.py b/tensorflow/contrib/grid_rnn/python/kernel_tests/grid_rnn_test.py
index 758e0bcc07..280271a42d 100644
--- a/tensorflow/contrib/grid_rnn/python/kernel_tests/grid_rnn_test.py
+++ b/tensorflow/contrib/grid_rnn/python/kernel_tests/grid_rnn_test.py
@@ -34,180 +34,228 @@ from tensorflow.python.platform import test
 class GridRNNCellTest(test.TestCase):
 
   def testGrid2BasicLSTMCell(self):
-    with self.test_session() as sess:
+    with self.test_session(use_gpu=False) as sess:
       with variable_scope.variable_scope(
           'root', initializer=init_ops.constant_initializer(0.2)) as root_scope:
         x = array_ops.zeros([1, 3])
-        m = array_ops.zeros([1, 8])
+        m = ((array_ops.zeros([1, 2]), array_ops.zeros([1, 2])),
+             (array_ops.zeros([1, 2]), array_ops.zeros([1, 2])))
         cell = grid_rnn_cell.Grid2BasicLSTMCell(2)
-        self.assertEqual(cell.state_size, 8)
+        self.assertEqual(cell.state_size, ((2, 2), (2, 2)))
 
         g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (1, 2))
-        self.assertEqual(s.get_shape(), (1, 8))
+        self.assertEqual(g[0].get_shape(), (1, 2))
+        self.assertEqual(s[0].c.get_shape(), (1, 2))
+        self.assertEqual(s[0].h.get_shape(), (1, 2))
+        self.assertEqual(s[1].c.get_shape(), (1, 2))
+        self.assertEqual(s[1].h.get_shape(), (1, 2))
 
         sess.run([variables.global_variables_initializer()])
-        res = sess.run([g, s], {
-            x: np.array([[1., 1., 1.]]),
-            m: np.array([[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]])
+        res_g, res_s = sess.run([g, s], {
+            x:
+                np.array([[1., 1., 1.]]),
+            m: ((np.array([[0.1, 0.2]]), np.array([[0.3, 0.4]])),
+                (np.array([[0.5, 0.6]]), np.array([[0.7, 0.8]])))
         })
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 8))
-        self.assertAllClose(res[0], [[0.36617181, 0.36617181]])
-        self.assertAllClose(res[1], [[0.71053141, 0.71053141, 0.36617181,
-                                      0.36617181, 0.72320831, 0.80555487,
-                                      0.39102408, 0.42150158]])
+        self.assertEqual(res_g[0].shape, (1, 2))
+        self.assertEqual(res_s[0].c.shape, (1, 2))
+        self.assertEqual(res_s[0].h.shape, (1, 2))
+        self.assertEqual(res_s[1].c.shape, (1, 2))
+        self.assertEqual(res_s[1].h.shape, (1, 2))
+
+        self.assertAllClose(res_g, ([[0.36617181, 0.36617181]],))
+        self.assertAllClose(
+            res_s, (([[0.71053141, 0.71053141]], [[0.36617181, 0.36617181]]),
+                    ([[0.72320831, 0.80555487]], [[0.39102408, 0.42150158]])))
 
         # emulate a loop through the input sequence,
         # where we call cell() multiple times
         root_scope.reuse_variables()
         g2, s2 = cell(x, m)
-        self.assertEqual(g2.get_shape(), (1, 2))
-        self.assertEqual(s2.get_shape(), (1, 8))
-
-        res = sess.run([g2, s2], {x: np.array([[2., 2., 2.]]), m: res[1]})
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 8))
-        self.assertAllClose(res[0], [[0.58847463, 0.58847463]])
-        self.assertAllClose(res[1], [[1.40469193, 1.40469193, 0.58847463,
-                                      0.58847463, 0.97726452, 1.04626071,
-                                      0.4927212, 0.51137757]])
+        self.assertEqual(g2[0].get_shape(), (1, 2))
+        self.assertEqual(s2[0].c.get_shape(), (1, 2))
+        self.assertEqual(s2[0].h.get_shape(), (1, 2))
+        self.assertEqual(s2[1].c.get_shape(), (1, 2))
+        self.assertEqual(s2[1].h.get_shape(), (1, 2))
+
+        res_g2, res_s2 = sess.run([g2, s2],
+                                  {x: np.array([[2., 2., 2.]]),
+                                   m: res_s})
+        self.assertEqual(res_g2[0].shape, (1, 2))
+        self.assertEqual(res_s2[0].c.shape, (1, 2))
+        self.assertEqual(res_s2[0].h.shape, (1, 2))
+        self.assertEqual(res_s2[1].c.shape, (1, 2))
+        self.assertEqual(res_s2[1].h.shape, (1, 2))
+        self.assertAllClose(res_g2[0], [[0.58847463, 0.58847463]])
+        self.assertAllClose(
+            res_s2, (([[1.40469193, 1.40469193]], [[0.58847463, 0.58847463]]),
+                     ([[0.97726452, 1.04626071]], [[0.4927212, 0.51137757]])))
 
   def testGrid2BasicLSTMCellTied(self):
-    with self.test_session() as sess:
+    with self.test_session(use_gpu=False) as sess:
       with variable_scope.variable_scope(
           'root', initializer=init_ops.constant_initializer(0.2)):
         x = array_ops.zeros([1, 3])
-        m = array_ops.zeros([1, 8])
+        m = ((array_ops.zeros([1, 2]), array_ops.zeros([1, 2])),
+             (array_ops.zeros([1, 2]), array_ops.zeros([1, 2])))
         cell = grid_rnn_cell.Grid2BasicLSTMCell(2, tied=True)
-        self.assertEqual(cell.state_size, 8)
+        self.assertEqual(cell.state_size, ((2, 2), (2, 2)))
 
         g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (1, 2))
-        self.assertEqual(s.get_shape(), (1, 8))
+        self.assertEqual(g[0].get_shape(), (1, 2))
+        self.assertEqual(s[0].c.get_shape(), (1, 2))
+        self.assertEqual(s[0].h.get_shape(), (1, 2))
+        self.assertEqual(s[1].c.get_shape(), (1, 2))
+        self.assertEqual(s[1].h.get_shape(), (1, 2))
 
         sess.run([variables.global_variables_initializer()])
-        res = sess.run([g, s], {
-            x: np.array([[1., 1., 1.]]),
-            m: np.array([[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]])
+        res_g, res_s = sess.run([g, s], {
+            x:
+                np.array([[1., 1., 1.]]),
+            m: ((np.array([[0.1, 0.2]]), np.array([[0.3, 0.4]])),
+                (np.array([[0.5, 0.6]]), np.array([[0.7, 0.8]])))
         })
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 8))
-        self.assertAllClose(res[0], [[0.36617181, 0.36617181]])
-        self.assertAllClose(res[1], [[0.71053141, 0.71053141, 0.36617181,
-                                      0.36617181, 0.72320831, 0.80555487,
-                                      0.39102408, 0.42150158]])
+        self.assertEqual(res_g[0].shape, (1, 2))
+        self.assertEqual(res_s[0].c.shape, (1, 2))
+        self.assertEqual(res_s[0].h.shape, (1, 2))
+        self.assertEqual(res_s[1].c.shape, (1, 2))
+        self.assertEqual(res_s[1].h.shape, (1, 2))
 
-        res = sess.run([g, s], {x: np.array([[1., 1., 1.]]), m: res[1]})
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 8))
-        self.assertAllClose(res[0], [[0.36703536, 0.36703536]])
-        self.assertAllClose(res[1], [[0.71200621, 0.71200621, 0.36703536,
-                                      0.36703536, 0.80941606, 0.87550586,
-                                      0.40108523, 0.42199609]])
+        self.assertAllClose(res_g[0], [[0.36617181, 0.36617181]])
+        self.assertAllClose(
+            res_s, (([[0.71053141, 0.71053141]], [[0.36617181, 0.36617181]]),
+                    ([[0.72320831, 0.80555487]], [[0.39102408, 0.42150158]])))
+
+        res_g, res_s = sess.run([g, s], {x: np.array([[1., 1., 1.]]), m: res_s})
+        self.assertEqual(res_g[0].shape, (1, 2))
+
+        self.assertAllClose(res_g[0], [[0.36703536, 0.36703536]])
+        self.assertAllClose(
+            res_s, (([[0.71200621, 0.71200621]], [[0.36703536, 0.36703536]]),
+                    ([[0.80941606, 0.87550586]], [[0.40108523, 0.42199609]])))
 
   def testGrid2BasicLSTMCellWithRelu(self):
-    with self.test_session() as sess:
+    with self.test_session(use_gpu=False) as sess:
       with variable_scope.variable_scope(
           'root', initializer=init_ops.constant_initializer(0.2)):
         x = array_ops.zeros([1, 3])
-        m = array_ops.zeros([1, 4])
+        m = ((array_ops.zeros([1, 2]), array_ops.zeros([1, 2])),)
         cell = grid_rnn_cell.Grid2BasicLSTMCell(
             2, tied=False, non_recurrent_fn=nn_ops.relu)
-        self.assertEqual(cell.state_size, 4)
+        self.assertEqual(cell.state_size, ((2, 2),))
 
         g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (1, 2))
-        self.assertEqual(s.get_shape(), (1, 4))
+        self.assertEqual(g[0].get_shape(), (1, 2))
+        self.assertEqual(s[0].c.get_shape(), (1, 2))
+        self.assertEqual(s[0].h.get_shape(), (1, 2))
 
         sess.run([variables.global_variables_initializer()])
-        res = sess.run(
-            [g, s],
-            {x: np.array([[1., 1., 1.]]),
-             m: np.array([[0.1, 0.2, 0.3, 0.4]])})
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 4))
-        self.assertAllClose(res[0], [[0.31667367, 0.31667367]])
-        self.assertAllClose(res[1], [[0.29530135, 0.37520045, 0.17044567,
-                                      0.21292259]])
+        res_g, res_s = sess.run([g, s], {
+            x: np.array([[1., 1., 1.]]),
+            m: ((np.array([[0.1, 0.2]]), np.array([[0.3, 0.4]])),)
+        })
+        self.assertEqual(res_g[0].shape, (1, 2))
+        self.assertAllClose(res_g[0], [[0.31667367, 0.31667367]])
+        self.assertAllClose(res_s, (([[0.29530135, 0.37520045]],
+                                     [[0.17044567, 0.21292259]]),))
 
   """LSTMCell
   """
 
   def testGrid2LSTMCell(self):
-    with self.test_session() as sess:
+    with self.test_session(use_gpu=False) as sess:
       with variable_scope.variable_scope(
           'root', initializer=init_ops.constant_initializer(0.5)):
         x = array_ops.zeros([1, 3])
-        m = array_ops.zeros([1, 8])
+        m = ((array_ops.zeros([1, 2]), array_ops.zeros([1, 2])),
+             (array_ops.zeros([1, 2]), array_ops.zeros([1, 2])))
         cell = grid_rnn_cell.Grid2LSTMCell(2, use_peepholes=True)
-        self.assertEqual(cell.state_size, 8)
+        self.assertEqual(cell.state_size, ((2, 2), (2, 2)))
 
         g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (1, 2))
-        self.assertEqual(s.get_shape(), (1, 8))
+        self.assertEqual(g[0].get_shape(), (1, 2))
+        self.assertEqual(s[0].c.get_shape(), (1, 2))
+        self.assertEqual(s[0].h.get_shape(), (1, 2))
+        self.assertEqual(s[1].c.get_shape(), (1, 2))
+        self.assertEqual(s[1].h.get_shape(), (1, 2))
 
         sess.run([variables.global_variables_initializer()])
-        res = sess.run([g, s], {
-            x: np.array([[1., 1., 1.]]),
-            m: np.array([[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]])
+        res_g, res_s = sess.run([g, s], {
+            x:
+                np.array([[1., 1., 1.]]),
+            m: ((np.array([[0.1, 0.2]]), np.array([[0.3, 0.4]])),
+                (np.array([[0.5, 0.6]]), np.array([[0.7, 0.8]])))
         })
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 8))
-        self.assertAllClose(res[0], [[0.95686918, 0.95686918]])
-        self.assertAllClose(res[1], [[2.41515064, 2.41515064, 0.95686918,
-                                      0.95686918, 1.38917875, 1.49043763,
-                                      0.83884692, 0.86036491]])
+        self.assertEqual(res_g[0].shape, (1, 2))
+        self.assertEqual(res_s[0].c.shape, (1, 2))
+        self.assertEqual(res_s[0].h.shape, (1, 2))
+        self.assertEqual(res_s[1].c.shape, (1, 2))
+        self.assertEqual(res_s[1].h.shape, (1, 2))
+
+        self.assertAllClose(res_g[0], [[0.95686918, 0.95686918]])
+        self.assertAllClose(
+            res_s, (([[2.41515064, 2.41515064]], [[0.95686918, 0.95686918]]),
+                    ([[1.38917875, 1.49043763]], [[0.83884692, 0.86036491]])))
 
   def testGrid2LSTMCellTied(self):
-    with self.test_session() as sess:
+    with self.test_session(use_gpu=False) as sess:
       with variable_scope.variable_scope(
           'root', initializer=init_ops.constant_initializer(0.5)):
         x = array_ops.zeros([1, 3])
-        m = array_ops.zeros([1, 8])
+        m = ((array_ops.zeros([1, 2]), array_ops.zeros([1, 2])),
+             (array_ops.zeros([1, 2]), array_ops.zeros([1, 2])))
         cell = grid_rnn_cell.Grid2LSTMCell(2, tied=True, use_peepholes=True)
-        self.assertEqual(cell.state_size, 8)
+        self.assertEqual(cell.state_size, ((2, 2), (2, 2)))
 
         g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (1, 2))
-        self.assertEqual(s.get_shape(), (1, 8))
+        self.assertEqual(g[0].get_shape(), (1, 2))
+        self.assertEqual(s[0].c.get_shape(), (1, 2))
+        self.assertEqual(s[0].h.get_shape(), (1, 2))
+        self.assertEqual(s[1].c.get_shape(), (1, 2))
+        self.assertEqual(s[1].h.get_shape(), (1, 2))
 
         sess.run([variables.global_variables_initializer()])
-        res = sess.run([g, s], {
-            x: np.array([[1., 1., 1.]]),
-            m: np.array([[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]])
+        res_g, res_s = sess.run([g, s], {
+            x:
+                np.array([[1., 1., 1.]]),
+            m: ((np.array([[0.1, 0.2]]), np.array([[0.3, 0.4]])),
+                (np.array([[0.5, 0.6]]), np.array([[0.7, 0.8]])))
         })
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 8))
-        self.assertAllClose(res[0], [[0.95686918, 0.95686918]])
-        self.assertAllClose(res[1], [[2.41515064, 2.41515064, 0.95686918,
-                                      0.95686918, 1.38917875, 1.49043763,
-                                      0.83884692, 0.86036491]])
+        self.assertEqual(res_g[0].shape, (1, 2))
+        self.assertEqual(res_s[0].c.shape, (1, 2))
+        self.assertEqual(res_s[0].h.shape, (1, 2))
+        self.assertEqual(res_s[1].c.shape, (1, 2))
+        self.assertEqual(res_s[1].h.shape, (1, 2))
+
+        self.assertAllClose(res_g[0], [[0.95686918, 0.95686918]])
+        self.assertAllClose(
+            res_s, (([[2.41515064, 2.41515064]], [[0.95686918, 0.95686918]]),
+                    ([[1.38917875, 1.49043763]], [[0.83884692, 0.86036491]])))
 
   def testGrid2LSTMCellWithRelu(self):
     with self.test_session() as sess:
       with variable_scope.variable_scope(
           'root', initializer=init_ops.constant_initializer(0.5)):
         x = array_ops.zeros([1, 3])
-        m = array_ops.zeros([1, 4])
+        m = ((array_ops.zeros([1, 2]), array_ops.zeros([1, 2])),)
         cell = grid_rnn_cell.Grid2LSTMCell(
             2, use_peepholes=True, non_recurrent_fn=nn_ops.relu)
-        self.assertEqual(cell.state_size, 4)
+        self.assertEqual(cell.state_size, ((2, 2),))
 
         g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (1, 2))
-        self.assertEqual(s.get_shape(), (1, 4))
+        self.assertEqual(g[0].get_shape(), (1, 2))
+        self.assertEqual(s[0].c.get_shape(), (1, 2))
+        self.assertEqual(s[0].h.get_shape(), (1, 2))
 
         sess.run([variables.global_variables_initializer()])
-        res = sess.run(
-            [g, s],
-            {x: np.array([[1., 1., 1.]]),
-             m: np.array([[0.1, 0.2, 0.3, 0.4]])})
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 4))
-        self.assertAllClose(res[0], [[2.1831727, 2.1831727]])
-        self.assertAllClose(res[1], [[0.92270052, 1.02325559, 0.66159075,
-                                      0.70475441]])
+        res_g, res_s = sess.run([g, s], {
+            x: np.array([[1., 1., 1.]]),
+            m: ((np.array([[0.1, 0.2]]), np.array([[0.3, 0.4]])),)
+        })
+        self.assertEqual(res_g[0].shape, (1, 2))
+        self.assertAllClose(res_g[0], [[2.1831727, 2.1831727]])
+        self.assertAllClose(res_s, (([[0.92270052, 1.02325559]],
+                                     [[0.66159075, 0.70475441]]),))
 
   """RNNCell
   """
@@ -217,74 +265,84 @@ class GridRNNCellTest(test.TestCase):
       with variable_scope.variable_scope(
           'root', initializer=init_ops.constant_initializer(0.5)):
         x = array_ops.zeros([2, 2])
-        m = array_ops.zeros([2, 4])
+        m = (array_ops.zeros([2, 2]), array_ops.zeros([2, 2]))
         cell = grid_rnn_cell.Grid2BasicRNNCell(2)
-        self.assertEqual(cell.state_size, 4)
+        self.assertEqual(cell.state_size, (2, 2))
 
         g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (2, 2))
-        self.assertEqual(s.get_shape(), (2, 4))
+        self.assertEqual(g[0].get_shape(), (2, 2))
+        self.assertEqual(s[0].get_shape(), (2, 2))
+        self.assertEqual(s[1].get_shape(), (2, 2))
 
         sess.run([variables.global_variables_initializer()])
-        res = sess.run([g, s], {
-            x: np.array([[1., 1.], [2., 2.]]),
-            m: np.array([[0.1, 0.1, 0.1, 0.1], [0.2, 0.2, 0.2, 0.2]])
+        res_g, res_s = sess.run([g, s], {
+            x:
+                np.array([[1., 1.], [2., 2.]]),
+            m: (np.array([[0.1, 0.1], [0.2, 0.2]]), np.array([[0.1, 0.1],
+                                                              [0.2, 0.2]]))
         })
-        self.assertEqual(res[0].shape, (2, 2))
-        self.assertEqual(res[1].shape, (2, 4))
-        self.assertAllClose(res[0], [[0.94685763, 0.94685763],
-                                     [0.99480951, 0.99480951]])
-        self.assertAllClose(res[1],
-                            [[0.94685763, 0.94685763, 0.80049908, 0.80049908],
-                             [0.99480951, 0.99480951, 0.97574311, 0.97574311]])
+        self.assertEqual(res_g[0].shape, (2, 2))
+        self.assertEqual(res_s[0].shape, (2, 2))
+        self.assertEqual(res_s[1].shape, (2, 2))
+
+        self.assertAllClose(res_g, ([[0.94685763, 0.94685763],
+                                     [0.99480951, 0.99480951]],))
+        self.assertAllClose(
+            res_s, ([[0.94685763, 0.94685763], [0.99480951, 0.99480951]],
+                    [[0.80049908, 0.80049908], [0.97574311, 0.97574311]]))
 
   def testGrid2BasicRNNCellTied(self):
     with self.test_session() as sess:
       with variable_scope.variable_scope(
           'root', initializer=init_ops.constant_initializer(0.5)):
         x = array_ops.zeros([2, 2])
-        m = array_ops.zeros([2, 4])
+        m = (array_ops.zeros([2, 2]), array_ops.zeros([2, 2]))
         cell = grid_rnn_cell.Grid2BasicRNNCell(2, tied=True)
-        self.assertEqual(cell.state_size, 4)
+        self.assertEqual(cell.state_size, (2, 2))
 
         g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (2, 2))
-        self.assertEqual(s.get_shape(), (2, 4))
+        self.assertEqual(g[0].get_shape(), (2, 2))
+        self.assertEqual(s[0].get_shape(), (2, 2))
+        self.assertEqual(s[1].get_shape(), (2, 2))
 
         sess.run([variables.global_variables_initializer()])
-        res = sess.run([g, s], {
-            x: np.array([[1., 1.], [2., 2.]]),
-            m: np.array([[0.1, 0.1, 0.1, 0.1], [0.2, 0.2, 0.2, 0.2]])
+        res_g, res_s = sess.run([g, s], {
+            x:
+                np.array([[1., 1.], [2., 2.]]),
+            m: (np.array([[0.1, 0.1], [0.2, 0.2]]), np.array([[0.1, 0.1],
+                                                              [0.2, 0.2]]))
         })
-        self.assertEqual(res[0].shape, (2, 2))
-        self.assertEqual(res[1].shape, (2, 4))
-        self.assertAllClose(res[0], [[0.94685763, 0.94685763],
-                                     [0.99480951, 0.99480951]])
-        self.assertAllClose(res[1],
-                            [[0.94685763, 0.94685763, 0.80049908, 0.80049908],
-                             [0.99480951, 0.99480951, 0.97574311, 0.97574311]])
+        self.assertEqual(res_g[0].shape, (2, 2))
+        self.assertEqual(res_s[0].shape, (2, 2))
+        self.assertEqual(res_s[1].shape, (2, 2))
+
+        self.assertAllClose(res_g, ([[0.94685763, 0.94685763],
+                                     [0.99480951, 0.99480951]],))
+        self.assertAllClose(
+            res_s, ([[0.94685763, 0.94685763], [0.99480951, 0.99480951]],
+                    [[0.80049908, 0.80049908], [0.97574311, 0.97574311]]))
 
   def testGrid2BasicRNNCellWithRelu(self):
     with self.test_session() as sess:
       with variable_scope.variable_scope(
           'root', initializer=init_ops.constant_initializer(0.5)):
         x = array_ops.zeros([1, 2])
-        m = array_ops.zeros([1, 2])
+        m = (array_ops.zeros([1, 2]),)
         cell = grid_rnn_cell.Grid2BasicRNNCell(2, non_recurrent_fn=nn_ops.relu)
-        self.assertEqual(cell.state_size, 2)
+        self.assertEqual(cell.state_size, (2,))
 
         g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (1, 2))
-        self.assertEqual(s.get_shape(), (1, 2))
+        self.assertEqual(g[0].get_shape(), (1, 2))
+        self.assertEqual(s[0].get_shape(), (1, 2))
 
         sess.run([variables.global_variables_initializer()])
-        res = sess.run([g, s],
-                       {x: np.array([[1., 1.]]),
-                        m: np.array([[0.1, 0.1]])})
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 2))
-        self.assertAllClose(res[0], [[1.80049896, 1.80049896]])
-        self.assertAllClose(res[1], [[0.80049896, 0.80049896]])
+        res_g, res_s = sess.run(
+            [g, s], {x: np.array([[1., 1.]]),
+                     m: np.array([[0.1, 0.1]])})
+        self.assertEqual(res_g[0].shape, (1, 2))
+        self.assertEqual(res_s[0].shape, (1, 2))
+        self.assertAllClose(res_g, ([[1.80049896, 1.80049896]],))
+        self.assertAllClose(res_s, ([[0.80049896, 0.80049896]],))
 
   """1-LSTM
   """
@@ -294,51 +352,59 @@ class GridRNNCellTest(test.TestCase):
       with variable_scope.variable_scope(
           'root', initializer=init_ops.constant_initializer(0.5)) as root_scope:
         x = array_ops.zeros([1, 3])
-        m = array_ops.zeros([1, 4])
+        m = ((array_ops.zeros([1, 2]), array_ops.zeros([1, 2])),)
         cell = grid_rnn_cell.Grid1LSTMCell(2, use_peepholes=True)
-        self.assertEqual(cell.state_size, 4)
+        self.assertEqual(cell.state_size, ((2, 2),))
 
         g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (1, 2))
-        self.assertEqual(s.get_shape(), (1, 4))
+        self.assertEqual(g[0].get_shape(), (1, 2))
+        self.assertEqual(s[0].c.get_shape(), (1, 2))
+        self.assertEqual(s[0].h.get_shape(), (1, 2))
 
         sess.run([variables.global_variables_initializer()])
-        res = sess.run(
-            [g, s],
-            {x: np.array([[1., 1., 1.]]),
-             m: np.array([[0.1, 0.2, 0.3, 0.4]])})
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 4))
-        self.assertAllClose(res[0], [[0.91287315, 0.91287315]])
-        self.assertAllClose(res[1],
-                            [[2.26285243, 2.26285243, 0.91287315, 0.91287315]])
+        res_g, res_s = sess.run([g, s], {
+            x: np.array([[1., 1., 1.]]),
+            m: ((np.array([[0.1, 0.2]]), np.array([[0.3, 0.4]])),)
+        })
+        self.assertEqual(res_g[0].shape, (1, 2))
+        self.assertEqual(res_s[0].c.shape, (1, 2))
+        self.assertEqual(res_s[0].h.shape, (1, 2))
+
+        self.assertAllClose(res_g, ([[0.91287315, 0.91287315]],))
+        self.assertAllClose(res_s, (([[2.26285243, 2.26285243]],
+                                     [[0.91287315, 0.91287315]]),))
 
         root_scope.reuse_variables()
 
         x2 = array_ops.zeros([0, 0])
         g2, s2 = cell(x2, m)
-        self.assertEqual(g2.get_shape(), (1, 2))
-        self.assertEqual(s2.get_shape(), (1, 4))
+        self.assertEqual(g2[0].get_shape(), (1, 2))
+        self.assertEqual(s2[0].c.get_shape(), (1, 2))
+        self.assertEqual(s2[0].h.get_shape(), (1, 2))
 
         sess.run([variables.global_variables_initializer()])
-        res = sess.run([g2, s2], {m: res[1]})
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 4))
-        self.assertAllClose(res[0], [[0.9032144, 0.9032144]])
-        self.assertAllClose(res[1],
-                            [[2.79966092, 2.79966092, 0.9032144, 0.9032144]])
+        res_g2, res_s2 = sess.run([g2, s2], {m: res_s})
+        self.assertEqual(res_g2[0].shape, (1, 2))
+        self.assertEqual(res_s2[0].c.shape, (1, 2))
+        self.assertEqual(res_s2[0].h.shape, (1, 2))
+
+        self.assertAllClose(res_g2, ([[0.9032144, 0.9032144]],))
+        self.assertAllClose(res_s2, (([[2.79966092, 2.79966092]],
+                                      [[0.9032144, 0.9032144]]),))
 
         g3, s3 = cell(x2, m)
-        self.assertEqual(g3.get_shape(), (1, 2))
-        self.assertEqual(s3.get_shape(), (1, 4))
+        self.assertEqual(g3[0].get_shape(), (1, 2))
+        self.assertEqual(s3[0].c.get_shape(), (1, 2))
+        self.assertEqual(s3[0].h.get_shape(), (1, 2))
 
         sess.run([variables.global_variables_initializer()])
-        res = sess.run([g3, s3], {m: res[1]})
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 4))
-        self.assertAllClose(res[0], [[0.92727238, 0.92727238]])
-        self.assertAllClose(res[1],
-                            [[3.3529923, 3.3529923, 0.92727238, 0.92727238]])
+        res_g3, res_s3 = sess.run([g3, s3], {m: res_s2})
+        self.assertEqual(res_g3[0].shape, (1, 2))
+        self.assertEqual(res_s3[0].c.shape, (1, 2))
+        self.assertEqual(res_s3[0].h.shape, (1, 2))
+        self.assertAllClose(res_g3, ([[0.92727238, 0.92727238]],))
+        self.assertAllClose(res_s3, (([[3.3529923, 3.3529923]],
+                                      [[0.92727238, 0.92727238]]),))
 
   """3-LSTM
   """
@@ -348,32 +414,42 @@ class GridRNNCellTest(test.TestCase):
       with variable_scope.variable_scope(
           'root', initializer=init_ops.constant_initializer(0.5)):
         x = array_ops.zeros([1, 3])
-        m = array_ops.zeros([1, 12])
+        m = ((array_ops.zeros([1, 2]), array_ops.zeros([1, 2])),
+             (array_ops.zeros([1, 2]), array_ops.zeros([1, 2])),
+             (array_ops.zeros([1, 2]), array_ops.zeros([1, 2])))
         cell = grid_rnn_cell.Grid3LSTMCell(2, use_peepholes=True)
-        self.assertEqual(cell.state_size, 12)
+        self.assertEqual(cell.state_size, ((2, 2), (2, 2), (2, 2)))
 
         g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (1, 2))
-        self.assertEqual(s.get_shape(), (1, 12))
+        self.assertEqual(g[0].get_shape(), (1, 2))
+        self.assertEqual(s[0].c.get_shape(), (1, 2))
+        self.assertEqual(s[0].h.get_shape(), (1, 2))
+        self.assertEqual(s[1].c.get_shape(), (1, 2))
+        self.assertEqual(s[1].h.get_shape(), (1, 2))
+        self.assertEqual(s[2].c.get_shape(), (1, 2))
+        self.assertEqual(s[2].h.get_shape(), (1, 2))
 
         sess.run([variables.global_variables_initializer()])
-        res = sess.run([g, s], {
+        res_g, res_s = sess.run([g, s], {
             x:
                 np.array([[1., 1., 1.]]),
-            m:
-                np.array([[
-                    0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, -0.1, -0.2, -0.3,
-                    -0.4
-                ]])
+            m: ((np.array([[0.1, 0.2]]), np.array([[0.3, 0.4]])),
+                (np.array([[0.5, 0.6]]), np.array([[0.7, 0.8]])), (np.array(
+                    [[-0.1, -0.2]]), np.array([[-0.3, -0.4]])))
         })
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 12))
-
-        self.assertAllClose(res[0], [[0.96892911, 0.96892911]])
-        self.assertAllClose(res[1], [[2.45227885, 2.45227885, 0.96892911,
-                                      0.96892911, 1.33592629, 1.4373529,
-                                      0.80867189, 0.83247656, 0.7317788,
-                                      0.63205892, 0.56548983, 0.50446129]])
+        self.assertEqual(res_g[0].shape, (1, 2))
+        self.assertEqual(res_s[0].c.shape, (1, 2))
+        self.assertEqual(res_s[0].h.shape, (1, 2))
+        self.assertEqual(res_s[1].c.shape, (1, 2))
+        self.assertEqual(res_s[1].h.shape, (1, 2))
+        self.assertEqual(res_s[2].c.shape, (1, 2))
+        self.assertEqual(res_s[2].h.shape, (1, 2))
+
+        self.assertAllClose(res_g, ([[0.96892911, 0.96892911]],))
+        self.assertAllClose(
+            res_s, (([[2.45227885, 2.45227885]], [[0.96892911, 0.96892911]]),
+                    ([[1.33592629, 1.4373529]], [[0.80867189, 0.83247656]]),
+                    ([[0.7317788, 0.63205892]], [[0.56548983, 0.50446129]])))
 
   """Edge cases
   """
@@ -383,7 +459,7 @@ class GridRNNCellTest(test.TestCase):
       with variable_scope.variable_scope(
           'root', initializer=init_ops.constant_initializer(0.5)):
         x = array_ops.zeros([3, 2])
-        m = array_ops.zeros([0, 0])
+        m = ()
 
         # this is equivalent to relu
         cell = grid_rnn_cell.GridRNNCell(
@@ -394,21 +470,22 @@ class GridRNNCellTest(test.TestCase):
             non_recurrent_dims=0,
             non_recurrent_fn=nn_ops.relu)
         g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (3, 2))
-        self.assertEqual(s.get_shape(), (0, 0))
+        self.assertEqual(g[0].get_shape(), (3, 2))
+        self.assertEqual(s, ())
 
         sess.run([variables.global_variables_initializer()])
-        res = sess.run([g, s], {x: np.array([[1., -1.], [-2, 1], [2, -1]])})
-        self.assertEqual(res[0].shape, (3, 2))
-        self.assertEqual(res[1].shape, (0, 0))
-        self.assertAllClose(res[0], [[0, 0], [0, 0], [0.5, 0.5]])
+        res_g, res_s = sess.run([g, s],
+                                {x: np.array([[1., -1.], [-2, 1], [2, -1]])})
+        self.assertEqual(res_g[0].shape, (3, 2))
+        self.assertEqual(res_s, ())
+        self.assertAllClose(res_g, ([[0, 0], [0, 0], [0.5, 0.5]],))
 
   def testGridRNNEdgeCasesNoOutput(self):
     with self.test_session() as sess:
       with variable_scope.variable_scope(
           'root', initializer=init_ops.constant_initializer(0.5)):
         x = array_ops.zeros([1, 2])
-        m = array_ops.zeros([1, 4])
+        m = ((array_ops.zeros([1, 2]), array_ops.zeros([1, 2])),)
 
         # This cell produces no output
         cell = grid_rnn_cell.GridRNNCell(
@@ -419,16 +496,18 @@ class GridRNNCellTest(test.TestCase):
             non_recurrent_dims=0,
             non_recurrent_fn=nn_ops.relu)
         g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (0, 0))
-        self.assertEqual(s.get_shape(), (1, 4))
+        self.assertEqual(g, ())
+        self.assertEqual(s[0].c.get_shape(), (1, 2))
+        self.assertEqual(s[0].h.get_shape(), (1, 2))
 
         sess.run([variables.global_variables_initializer()])
-        res = sess.run(
-            [g, s],
-            {x: np.array([[1., 1.]]),
-             m: np.array([[0.1, 0.1, 0.1, 0.1]])})
-        self.assertEqual(res[0].shape, (0, 0))
-        self.assertEqual(res[1].shape, (1, 4))
+        res_g, res_s = sess.run([g, s], {
+            x: np.array([[1., 1.]]),
+            m: ((np.array([[0.1, 0.1]]), np.array([[0.1, 0.1]])),)
+        })
+        self.assertEqual(res_g, ())
+        self.assertEqual(res_s[0].c.shape, (1, 2))
+        self.assertEqual(res_s[0].h.shape, (1, 2))
 
   """Test with tf.nn.rnn
   """
@@ -451,20 +530,29 @@ class GridRNNCellTest(test.TestCase):
       outputs, state = core_rnn.static_rnn(cell, inputs, dtype=dtypes.float32)
 
     self.assertEqual(len(outputs), len(inputs))
-    self.assertEqual(state.get_shape(), (batch_size, 8))
+    self.assertEqual(state[0].c.get_shape(), (batch_size, 2))
+    self.assertEqual(state[0].h.get_shape(), (batch_size, 2))
+    self.assertEqual(state[1].c.get_shape(), (batch_size, 2))
+    self.assertEqual(state[1].h.get_shape(), (batch_size, 2))
 
     for out, inp in zip(outputs, inputs):
-      self.assertEqual(out.get_shape()[0], inp.get_shape()[0])
-      self.assertEqual(out.get_shape()[1], num_units)
-      self.assertEqual(out.dtype, inp.dtype)
+      self.assertEqual(len(out), 1)
+      self.assertEqual(out[0].get_shape()[0], inp.get_shape()[0])
+      self.assertEqual(out[0].get_shape()[1], num_units)
+      self.assertEqual(out[0].dtype, inp.dtype)
 
     with self.test_session() as sess:
       sess.run(variables.global_variables_initializer())
 
       input_value = np.ones((batch_size, input_size))
       values = sess.run(outputs + [state], feed_dict={inputs[0]: input_value})
-      for v in values:
-        self.assertTrue(np.all(np.isfinite(v)))
+      for tp in values[:-1]:
+        for v in tp:
+          self.assertTrue(np.all(np.isfinite(v)))
+      for tp in values[-1]:
+        for st in tp:
+          for v in st:
+            self.assertTrue(np.all(np.isfinite(v)))
 
   def testGrid2LSTMCellReLUWithRNN(self):
     batch_size = 3
@@ -478,27 +566,33 @@ class GridRNNCellTest(test.TestCase):
           num_units=num_units, non_recurrent_fn=nn_ops.relu)
 
       inputs = max_length * [
-          array_ops.placeholder(
-              dtypes.float32, shape=(batch_size, input_size))
+          array_ops.placeholder(dtypes.float32, shape=(batch_size, input_size))
       ]
 
       outputs, state = core_rnn.static_rnn(cell, inputs, dtype=dtypes.float32)
 
     self.assertEqual(len(outputs), len(inputs))
-    self.assertEqual(state.get_shape(), (batch_size, 4))
+    self.assertEqual(state[0].c.get_shape(), (batch_size, 2))
+    self.assertEqual(state[0].h.get_shape(), (batch_size, 2))
 
     for out, inp in zip(outputs, inputs):
-      self.assertEqual(out.get_shape()[0], inp.get_shape()[0])
-      self.assertEqual(out.get_shape()[1], num_units)
-      self.assertEqual(out.dtype, inp.dtype)
+      self.assertEqual(len(out), 1)
+      self.assertEqual(out[0].get_shape()[0], inp.get_shape()[0])
+      self.assertEqual(out[0].get_shape()[1], num_units)
+      self.assertEqual(out[0].dtype, inp.dtype)
 
     with self.test_session() as sess:
       sess.run(variables.global_variables_initializer())
 
       input_value = np.ones((batch_size, input_size))
       values = sess.run(outputs + [state], feed_dict={inputs[0]: input_value})
-      for v in values:
-        self.assertTrue(np.all(np.isfinite(v)))
+      for tp in values[:-1]:
+        for v in tp:
+          self.assertTrue(np.all(np.isfinite(v)))
+      for tp in values[-1]:
+        for st in tp:
+          for v in st:
+            self.assertTrue(np.all(np.isfinite(v)))
 
   def testGrid3LSTMCellReLUWithRNN(self):
     batch_size = 3
@@ -512,27 +606,35 @@ class GridRNNCellTest(test.TestCase):
           num_units=num_units, non_recurrent_fn=nn_ops.relu)
 
       inputs = max_length * [
-          array_ops.placeholder(
-              dtypes.float32, shape=(batch_size, input_size))
+          array_ops.placeholder(dtypes.float32, shape=(batch_size, input_size))
       ]
 
       outputs, state = core_rnn.static_rnn(cell, inputs, dtype=dtypes.float32)
 
     self.assertEqual(len(outputs), len(inputs))
-    self.assertEqual(state.get_shape(), (batch_size, 8))
+    self.assertEqual(state[0].c.get_shape(), (batch_size, 2))
+    self.assertEqual(state[0].h.get_shape(), (batch_size, 2))
+    self.assertEqual(state[1].c.get_shape(), (batch_size, 2))
+    self.assertEqual(state[1].h.get_shape(), (batch_size, 2))
 
     for out, inp in zip(outputs, inputs):
-      self.assertEqual(out.get_shape()[0], inp.get_shape()[0])
-      self.assertEqual(out.get_shape()[1], num_units)
-      self.assertEqual(out.dtype, inp.dtype)
+      self.assertEqual(len(out), 1)
+      self.assertEqual(out[0].get_shape()[0], inp.get_shape()[0])
+      self.assertEqual(out[0].get_shape()[1], num_units)
+      self.assertEqual(out[0].dtype, inp.dtype)
 
     with self.test_session() as sess:
       sess.run(variables.global_variables_initializer())
 
       input_value = np.ones((batch_size, input_size))
       values = sess.run(outputs + [state], feed_dict={inputs[0]: input_value})
-      for v in values:
-        self.assertTrue(np.all(np.isfinite(v)))
+      for tp in values[:-1]:
+        for v in tp:
+          self.assertTrue(np.all(np.isfinite(v)))
+      for tp in values[-1]:
+        for st in tp:
+          for v in st:
+            self.assertTrue(np.all(np.isfinite(v)))
 
   def testGrid1LSTMCellWithRNN(self):
     batch_size = 3
@@ -553,20 +655,91 @@ class GridRNNCellTest(test.TestCase):
       outputs, state = core_rnn.static_rnn(cell, inputs, dtype=dtypes.float32)
 
     self.assertEqual(len(outputs), len(inputs))
-    self.assertEqual(state.get_shape(), (batch_size, 4))
+    self.assertEqual(state[0].c.get_shape(), (batch_size, 2))
+    self.assertEqual(state[0].h.get_shape(), (batch_size, 2))
 
     for out, inp in zip(outputs, inputs):
-      self.assertEqual(out.get_shape(), (3, num_units))
-      self.assertEqual(out.dtype, inp.dtype)
+      self.assertEqual(len(out), 1)
+      self.assertEqual(out[0].get_shape(), (3, num_units))
+      self.assertEqual(out[0].dtype, inp.dtype)
 
     with self.test_session() as sess:
       sess.run(variables.global_variables_initializer())
 
       input_value = np.ones((batch_size, input_size))
       values = sess.run(outputs + [state], feed_dict={inputs[0]: input_value})
-      for v in values:
-        self.assertTrue(np.all(np.isfinite(v)))
+      for tp in values[:-1]:
+        for v in tp:
+          self.assertTrue(np.all(np.isfinite(v)))
+      for tp in values[-1]:
+        for st in tp:
+          for v in st:
+            self.assertTrue(np.all(np.isfinite(v)))
+
+  def testGrid2LSTMCellWithRNNAndDynamicBatchSize(self):
+    """Test for #4296."""
+    input_size = 5
+    max_length = 6  # unrolled up to this length
+    num_units = 2
+
+    with variable_scope.variable_scope(
+        'root', initializer=init_ops.constant_initializer(0.5)):
+      cell = grid_rnn_cell.Grid2LSTMCell(num_units=num_units)
 
+      inputs = max_length * [
+          array_ops.placeholder(dtypes.float32, shape=(None, input_size))
+      ]
+
+      outputs, state = core_rnn.static_rnn(cell, inputs, dtype=dtypes.float32)
+
+    self.assertEqual(len(outputs), len(inputs))
+
+    for out, inp in zip(outputs, inputs):
+      self.assertEqual(len(out), 1)
+      self.assertTrue(out[0].get_shape()[0].value is None)
+      self.assertEqual(out[0].get_shape()[1], num_units)
+      self.assertEqual(out[0].dtype, inp.dtype)
+
+    with self.test_session() as sess:
+      sess.run(variables.global_variables_initializer())
+
+      input_value = np.ones((3, input_size))
+      values = sess.run(outputs + [state], feed_dict={inputs[0]: input_value})
+      for tp in values[:-1]:
+        for v in tp:
+          self.assertTrue(np.all(np.isfinite(v)))
+      for tp in values[-1]:
+        for st in tp:
+          for v in st:
+            self.assertTrue(np.all(np.isfinite(v)))
+
+  def testGrid2LSTMCellLegacy(self):
+    """Test for legacy case (when state_is_tuple=False)."""
+    with self.test_session() as sess:
+      with variable_scope.variable_scope(
+          'root', initializer=init_ops.constant_initializer(0.5)):
+        x = array_ops.zeros([1, 3])
+        m = array_ops.zeros([1, 8])
+        cell = grid_rnn_cell.Grid2LSTMCell(
+            2, use_peepholes=True, state_is_tuple=False, output_is_tuple=False)
+        self.assertEqual(cell.state_size, 8)
+
+        g, s = cell(x, m)
+        self.assertEqual(g.get_shape(), (1, 2))
+        self.assertEqual(s.get_shape(), (1, 8))
+
+        sess.run([variables.global_variables_initializer()])
+        res = sess.run([g, s], {
+            x: np.array([[1., 1., 1.]]),
+            m: np.array([[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]])
+        })
+        self.assertEqual(res[0].shape, (1, 2))
+        self.assertEqual(res[1].shape, (1, 8))
+        self.assertAllClose(res[0], [[0.95686918, 0.95686918]])
+        self.assertAllClose(res[1], [[
+            2.41515064, 2.41515064, 0.95686918, 0.95686918, 1.38917875,
+            1.49043763, 0.83884692, 0.86036491
+        ]])
 
 if __name__ == '__main__':
   test.main()
diff --git a/tensorflow/contrib/grid_rnn/python/ops/grid_rnn_cell.py b/tensorflow/contrib/grid_rnn/python/ops/grid_rnn_cell.py
index 269b224581..252788140f 100644
--- a/tensorflow/contrib/grid_rnn/python/ops/grid_rnn_cell.py
+++ b/tensorflow/contrib/grid_rnn/python/ops/grid_rnn_cell.py
@@ -25,6 +25,8 @@ from tensorflow.python.ops import array_ops
 from tensorflow.python.ops import math_ops
 from tensorflow.python.ops import nn
 from tensorflow.python.ops import variable_scope as vs
+
+from tensorflow.python.platform import tf_logging as logging
 from tensorflow.contrib import layers
 from tensorflow.contrib import rnn
 
@@ -53,7 +55,9 @@ class GridRNNCell(rnn.RNNCell):
                non_recurrent_dims=None,
                tied=False,
                cell_fn=None,
-               non_recurrent_fn=None):
+               non_recurrent_fn=None,
+               state_is_tuple=True,
+               output_is_tuple=True):
     """Initialize the parameters of a Grid RNN cell
 
     Args:
@@ -68,26 +72,47 @@ class GridRNNCell(rnn.RNNCell):
       non_recurrent_dims: int or list, List of dimensions that are not
         recurrent.
               The transfer function for non-recurrent dimensions is specified
-                via `non_recurrent_fn`,
-              which is default to be `tensorflow.nn.relu`.
+                via `non_recurrent_fn`, which is
+                default to be `tensorflow.nn.relu`.
       tied: bool, Whether to share the weights among the dimensions of this
         GridRNN cell.
               If there are non-recurrent dimensions in the grid, weights are
-                shared between each
-              group of recurrent and non-recurrent dimensions.
-      cell_fn: function, a function which returns the recurrent cell object. Has
-        to be in the following signature:
-              def cell_func(num_units, input_size):
+                shared between each group of recurrent and non-recurrent
+                dimensions.
+      cell_fn: function, a function which returns the recurrent cell object.
+        Has to be in the following signature:
+              ```
+              def cell_func(num_units):
                 # ...
-
+              ```
               and returns an object of type `RNNCell`. If None, LSTMCell with
                 default parameters will be used.
+        Note that if you use a custom RNNCell (with `cell_fn`), it is your
+        responsibility to make sure the inner cell use `state_is_tuple=True`.
+
       non_recurrent_fn: a tensorflow Op that will be the transfer function of
         the non-recurrent dimensions
+      state_is_tuple: If True, accepted and returned states are tuples of the
+        states of the recurrent dimensions. If False, they are concatenated
+        along the column axis. The latter behavior will soon be deprecated.
+
+        Note that if you use a custom RNNCell (with `cell_fn`), it is your
+        responsibility to make sure the inner cell use `state_is_tuple=True`.
+
+      output_is_tuple: If True, the output is a tuple of the outputs of the
+        recurrent dimensions. If False, they are concatenated along the
+        column axis. The later behavior will soon be deprecated.
 
     Raises:
       TypeError: if cell_fn does not return an RNNCell instance.
     """
+    if not state_is_tuple:
+      logging.warning('%s: Using a concatenated state is slower and will '
+                      'soon be deprecated.  Use state_is_tuple=True.', self)
+    if not output_is_tuple:
+      logging.warning('%s: Using a concatenated output is slower and will'
+                      'soon be deprecated.  Use output_is_tuple=True.', self)
+
     if num_dims < 1:
       raise ValueError('dims must be >= 1: {}'.format(num_dims))
 
@@ -96,37 +121,41 @@ class GridRNNCell(rnn.RNNCell):
                                      non_recurrent_fn or nn.relu, tied,
                                      num_units)
 
-    cell_input_size = (self._config.num_dims - 1) * num_units
+    self._state_is_tuple = state_is_tuple
+    self._output_is_tuple = output_is_tuple
+
     if cell_fn is None:
       my_cell_fn = functools.partial(
-          rnn.LSTMCell,
-          num_units=num_units, input_size=cell_input_size,
-          state_is_tuple=False)
+          rnn.LSTMCell, num_units=num_units, state_is_tuple=state_is_tuple)
     else:
-      my_cell_fn = lambda: cell_fn(num_units, cell_input_size)
+      my_cell_fn = lambda: cell_fn(num_units)
     if tied:
       self._cells = [my_cell_fn()] * num_dims
     else:
       self._cells = [my_cell_fn() for _ in range(num_dims)]
     if not isinstance(self._cells[0], rnn.RNNCell):
-      raise TypeError(
-          'cell_fn must return an RNNCell instance, saw: %s'
-          % type(self._cells[0]))
+      raise TypeError('cell_fn must return an RNNCell instance, saw: %s' %
+                      type(self._cells[0]))
 
-  @property
-  def input_size(self):
-    # temporarily using num_units as the input_size of each dimension.
-    # The actual input size only determined when this cell get invoked,
-    # so this information can be considered unreliable.
-    return self._config.num_units * len(self._config.inputs)
+    if self._output_is_tuple:
+      self._output_size = tuple(self._cells[0].output_size
+                                for _ in self._config.outputs)
+    else:
+      self._output_size = self._cells[0].output_size * len(self._config.outputs)
+
+    if self._state_is_tuple:
+      self._state_size = tuple(self._cells[0].state_size
+                               for _ in self._config.recurrents)
+    else:
+      self._state_size = self._cell_state_size() * len(self._config.recurrents)
 
   @property
   def output_size(self):
-    return self._cells[0].output_size * len(self._config.outputs)
+    return self._output_size
 
   @property
   def state_size(self):
-    return self._cells[0].state_size * len(self._config.recurrents)
+    return self._state_size
 
   def __call__(self, inputs, state, scope=None):
     """Run one step of GridRNN.
@@ -145,76 +174,148 @@ class GridRNNCell(rnn.RNNCell):
       - A 2D, batch x state_size, Tensor representing the new state of the cell
         after reading "inputs" when previous state was "state".
     """
-    state_sz = state.get_shape().as_list()[1]
-    if self.state_size != state_sz:
-      raise ValueError(
-          'Actual state size not same as specified: {} vs {}.'.format(
-              state_sz, self.state_size))
-
     conf = self._config
-    dtype = inputs.dtype if inputs is not None else state.dtype
+    dtype = inputs.dtype
 
-    # c_prev is `m`, and m_prev is `h` in the paper.
-    # Keep c and m here for consistency with the codebase
-    c_prev = [None] * self._config.num_dims
-    m_prev = [None] * self._config.num_dims
-    cell_output_size = self._cells[0].state_size - conf.num_units
-
-    # for LSTM   : state = memory cell + output, hence cell_output_size > 0
-    # for GRU/RNN: state = output (whose size is equal to _num_units),
-    #              hence cell_output_size = 0
-    for recurrent_dim, start_idx in zip(self._config.recurrents, range(
-        0, self.state_size, self._cells[0].state_size)):
-      if cell_output_size > 0:
-        c_prev[recurrent_dim] = array_ops.slice(state, [0, start_idx],
-                                                [-1, conf.num_units])
-        m_prev[recurrent_dim] = array_ops.slice(
-            state, [0, start_idx + conf.num_units], [-1, cell_output_size])
-      else:
-        m_prev[recurrent_dim] = array_ops.slice(state, [0, start_idx],
-                                                [-1, conf.num_units])
+    c_prev, m_prev, cell_output_size = self._extract_states(state)
 
     new_output = [None] * conf.num_dims
     new_state = [None] * conf.num_dims
 
     with vs.variable_scope(scope or type(self).__name__):  # GridRNNCell
+      # project input, populate c_prev and m_prev
+      self._project_input(inputs, c_prev, m_prev, cell_output_size > 0)
 
-      # project input
-      if inputs is not None and sum(inputs.get_shape().as_list()) > 0 and len(
-          conf.inputs) > 0:
-        input_splits = array_ops.split(
-            value=inputs, num_or_size_splits=len(conf.inputs), axis=1)
-        input_sz = input_splits[0].get_shape().as_list()[1]
-
-        for i, j in enumerate(conf.inputs):
-          input_project_m = vs.get_variable(
-              'project_m_{}'.format(j), [input_sz, conf.num_units], dtype=dtype)
-          m_prev[j] = math_ops.matmul(input_splits[i], input_project_m)
-
-          if cell_output_size > 0:
-            input_project_c = vs.get_variable(
-                'project_c_{}'.format(j), [input_sz, conf.num_units],
-                dtype=dtype)
-            c_prev[j] = math_ops.matmul(input_splits[i], input_project_c)
-
+      # propagate along dimensions, first for non-priority dimensions
+      # then priority dimensions
       _propagate(conf.non_priority, conf, self._cells, c_prev, m_prev,
                  new_output, new_state, True)
       _propagate(conf.priority, conf, self._cells,
                  c_prev, m_prev, new_output, new_state, False)
 
+      # collect outputs and states
       output_tensors = [new_output[i] for i in self._config.outputs]
-      output = array_ops.zeros(
-          [0, 0], dtype) if len(output_tensors) == 0 else array_ops.concat(
-              output_tensors, 1)
+      if self._output_is_tuple:
+        output = tuple(output_tensors)
+      else:
+        if output_tensors:
+          output = array_ops.concat(output_tensors, 1)
+        else:
+          output = array_ops.zeros([0, 0], dtype)
 
-      state_tensors = [new_state[i] for i in self._config.recurrents]
-      states = array_ops.zeros(
-          [0, 0],
-          dtype) if len(state_tensors) == 0 else array_ops.concat(state_tensors,
-                                                                  1)
+      if self._state_is_tuple:
+        states = tuple(new_state[i] for i in self._config.recurrents)
+      else:
+        # concat each state first, then flatten the whole thing
+        state_tensors = [
+            x for i in self._config.recurrents for x in new_state[i]
+        ]
+        if state_tensors:
+          states = array_ops.concat(state_tensors, 1)
+        else:
+          states = array_ops.zeros([0, 0], dtype)
 
     return output, states
 
+  def _extract_states(self, state):
+    """Extract the cell and previous output tensors from the given state.
+
+    Args:
+      state: The RNN state.
+
+    Returns:
+      Tuple of the cell value, previous output, and cell_output_size.
+
+    Raises:
+      ValueError: If len(self._config.recurrents) != len(state).
+    """
+    conf = self._config
+
+    # c_prev is `m` (cell value), and
+    # m_prev is `h` (previous output) in the paper.
+    # Keeping c and m here for consistency with the codebase
+    c_prev = [None] * conf.num_dims
+    m_prev = [None] * conf.num_dims
+
+    # for LSTM   : state = memory cell + output, hence cell_output_size > 0
+    # for GRU/RNN: state = output (whose size is equal to _num_units),
+    #              hence cell_output_size = 0
+    total_cell_state_size = self._cell_state_size()
+    cell_output_size = total_cell_state_size - conf.num_units
+
+    if self._state_is_tuple:
+      if len(conf.recurrents) != len(state):
+        raise ValueError('Expected state as a tuple of {} '
+                         'element'.format(len(conf.recurrents)))
+
+      for recurrent_dim, recurrent_state in zip(conf.recurrents, state):
+        if cell_output_size > 0:
+          c_prev[recurrent_dim], m_prev[recurrent_dim] = recurrent_state
+        else:
+          m_prev[recurrent_dim] = recurrent_state
+    else:
+      for recurrent_dim, start_idx in zip(conf.recurrents,
+                                          range(0, self.state_size,
+                                                total_cell_state_size)):
+        if cell_output_size > 0:
+          c_prev[recurrent_dim] = array_ops.slice(state, [0, start_idx],
+                                                  [-1, conf.num_units])
+          m_prev[recurrent_dim] = array_ops.slice(
+              state, [0, start_idx + conf.num_units], [-1, cell_output_size])
+        else:
+          m_prev[recurrent_dim] = array_ops.slice(state, [0, start_idx],
+                                                  [-1, conf.num_units])
+    return c_prev, m_prev, cell_output_size
+
+  def _project_input(self, inputs, c_prev, m_prev, with_c):
+    """Fills in c_prev and m_prev with projected input, for input dimensions.
+
+    Args:
+      inputs: inputs tensor
+      c_prev: cell value
+      m_prev: previous output
+      with_c: boolean; whether to include project_c.
+
+    Raises:
+      ValueError: if len(self._config.input) != len(inputs)
+    """
+    conf = self._config
+
+    if (inputs is not None and inputs.get_shape().with_rank(2)[1].value > 0 and
+        conf.inputs):
+      if isinstance(inputs, tuple):
+        if len(conf.inputs) != len(inputs):
+          raise ValueError('Expect inputs as a tuple of {} '
+                           'tensors'.format(len(conf.inputs)))
+        input_splits = inputs
+      else:
+        input_splits = array_ops.split(
+            value=inputs, num_or_size_splits=len(conf.inputs), axis=1)
+      input_sz = input_splits[0].get_shape().with_rank(2)[1].value
+
+      for i, j in enumerate(conf.inputs):
+        input_project_m = vs.get_variable(
+            'project_m_{}'.format(j), [input_sz, conf.num_units],
+            dtype=inputs.dtype)
+        m_prev[j] = math_ops.matmul(input_splits[i], input_project_m)
+
+        if with_c:
+          input_project_c = vs.get_variable(
+              'project_c_{}'.format(j), [input_sz, conf.num_units],
+              dtype=inputs.dtype)
+          c_prev[j] = math_ops.matmul(input_splits[i], input_project_c)
+
+  def _cell_state_size(self):
+    """Total size of the state of the inner cell used in this grid.
+
+    Returns:
+      Total size of the state of the inner cell.
+    """
+    state_sizes = self._cells[0].state_size
+    if isinstance(state_sizes, tuple):
+      return sum(state_sizes)
+    return state_sizes
+
 
 """Specialized cells, for convenience
 """
@@ -223,11 +324,17 @@ class GridRNNCell(rnn.RNNCell):
 class Grid1BasicRNNCell(GridRNNCell):
   """1D BasicRNN cell"""
 
-  def __init__(self, num_units):
+  def __init__(self, num_units, state_is_tuple=True, output_is_tuple=True):
     super(Grid1BasicRNNCell, self).__init__(
-        num_units=num_units, num_dims=1,
-        input_dims=0, output_dims=0, priority_dims=0, tied=False,
-        cell_fn=lambda n, i: rnn.BasicRNNCell(num_units=n, input_size=i))
+        num_units=num_units,
+        num_dims=1,
+        input_dims=0,
+        output_dims=0,
+        priority_dims=0,
+        tied=False,
+        cell_fn=lambda n: rnn.BasicRNNCell(num_units=n),
+        state_is_tuple=state_is_tuple,
+        output_is_tuple=output_is_tuple)
 
 
 class Grid2BasicRNNCell(GridRNNCell):
@@ -240,71 +347,112 @@ class Grid2BasicRNNCell(GridRNNCell):
   specified.
   """
 
-  def __init__(self, num_units, tied=False, non_recurrent_fn=None):
+  def __init__(self,
+               num_units,
+               tied=False,
+               non_recurrent_fn=None,
+               state_is_tuple=True,
+               output_is_tuple=True):
     super(Grid2BasicRNNCell, self).__init__(
-        num_units=num_units, num_dims=2,
-        input_dims=0, output_dims=0, priority_dims=0, tied=tied,
+        num_units=num_units,
+        num_dims=2,
+        input_dims=0,
+        output_dims=0,
+        priority_dims=0,
+        tied=tied,
         non_recurrent_dims=None if non_recurrent_fn is None else 0,
-        cell_fn=lambda n, i: rnn.BasicRNNCell(num_units=n, input_size=i),
-        non_recurrent_fn=non_recurrent_fn)
+        cell_fn=lambda n: rnn.BasicRNNCell(num_units=n),
+        non_recurrent_fn=non_recurrent_fn,
+        state_is_tuple=state_is_tuple,
+        output_is_tuple=output_is_tuple)
 
 
 class Grid1BasicLSTMCell(GridRNNCell):
-  """1D BasicLSTM cell"""
+  """1D BasicLSTM cell."""
 
-  def __init__(self, num_units, forget_bias=1):
+  def __init__(self,
+               num_units,
+               forget_bias=1,
+               state_is_tuple=True,
+               output_is_tuple=True):
+    def cell_fn(n):
+      return rnn.BasicLSTMCell(num_units=n, forget_bias=forget_bias)
     super(Grid1BasicLSTMCell, self).__init__(
-        num_units=num_units, num_dims=1,
-        input_dims=0, output_dims=0, priority_dims=0, tied=False,
-        cell_fn=lambda n, i: rnn.BasicLSTMCell(
-            num_units=n,
-            forget_bias=forget_bias, input_size=i,
-            state_is_tuple=False))
+        num_units=num_units,
+        num_dims=1,
+        input_dims=0,
+        output_dims=0,
+        priority_dims=0,
+        tied=False,
+        cell_fn=cell_fn,
+        state_is_tuple=state_is_tuple,
+        output_is_tuple=output_is_tuple)
 
 
 class Grid2BasicLSTMCell(GridRNNCell):
-  """2D BasicLSTM cell
+  """2D BasicLSTM cell.
 
-    This creates a 2D cell which receives input and gives output in the first
-    dimension.
+  This creates a 2D cell which receives input and gives output in the first
+  dimension.
 
-    The first dimension can optionally be non-recurrent if `non_recurrent_fn` is
-    specified.
+  The first dimension can optionally be non-recurrent if `non_recurrent_fn` is
+  specified.
   """
 
   def __init__(self,
                num_units,
                tied=False,
                non_recurrent_fn=None,
-               forget_bias=1):
+               forget_bias=1,
+               state_is_tuple=True,
+               output_is_tuple=True):
+    def cell_fn(n):
+      return rnn.BasicLSTMCell(num_units=n, forget_bias=forget_bias)
     super(Grid2BasicLSTMCell, self).__init__(
-        num_units=num_units, num_dims=2,
-        input_dims=0, output_dims=0, priority_dims=0, tied=tied,
+        num_units=num_units,
+        num_dims=2,
+        input_dims=0,
+        output_dims=0,
+        priority_dims=0,
+        tied=tied,
         non_recurrent_dims=None if non_recurrent_fn is None else 0,
-        cell_fn=lambda n, i: rnn.BasicLSTMCell(
-            num_units=n, forget_bias=forget_bias, input_size=i,
-            state_is_tuple=False),
-        non_recurrent_fn=non_recurrent_fn)
+        cell_fn=cell_fn,
+        non_recurrent_fn=non_recurrent_fn,
+        state_is_tuple=state_is_tuple,
+        output_is_tuple=output_is_tuple)
 
 
 class Grid1LSTMCell(GridRNNCell):
-  """1D LSTM cell
+  """1D LSTM cell.
 
-    This is different from Grid1BasicLSTMCell because it gives options to
-    specify the forget bias and enabling peepholes
+  This is different from Grid1BasicLSTMCell because it gives options to
+  specify the forget bias and enabling peepholes.
   """
 
-  def __init__(self, num_units, use_peepholes=False, forget_bias=1.0):
+  def __init__(self,
+               num_units,
+               use_peepholes=False,
+               forget_bias=1.0,
+               state_is_tuple=True,
+               output_is_tuple=True):
+
+    def cell_fn(n):
+      return rnn.LSTMCell(
+          num_units=n, forget_bias=forget_bias, use_peepholes=use_peepholes)
+
     super(Grid1LSTMCell, self).__init__(
-        num_units=num_units, num_dims=1,
-        input_dims=0, output_dims=0, priority_dims=0,
-        cell_fn=lambda n, i: rnn.LSTMCell(
-            num_units=n, input_size=i, use_peepholes=use_peepholes,
-            forget_bias=forget_bias, state_is_tuple=False))
+        num_units=num_units,
+        num_dims=1,
+        input_dims=0,
+        output_dims=0,
+        priority_dims=0,
+        cell_fn=cell_fn,
+        state_is_tuple=state_is_tuple,
+        output_is_tuple=output_is_tuple)
 
 
 class Grid2LSTMCell(GridRNNCell):
-  """2D LSTM cell
+  """2D LSTM cell.
 
     This creates a 2D cell which receives input and gives output in the first
     dimension.
@@ -317,19 +465,30 @@ class Grid2LSTMCell(GridRNNCell):
                tied=False,
                non_recurrent_fn=None,
                use_peepholes=False,
-               forget_bias=1.0):
+               forget_bias=1.0,
+               state_is_tuple=True,
+               output_is_tuple=True):
+
+    def cell_fn(n):
+      return rnn.LSTMCell(
+          num_units=n, forget_bias=forget_bias, use_peepholes=use_peepholes)
+
     super(Grid2LSTMCell, self).__init__(
-        num_units=num_units, num_dims=2,
-        input_dims=0, output_dims=0, priority_dims=0, tied=tied,
+        num_units=num_units,
+        num_dims=2,
+        input_dims=0,
+        output_dims=0,
+        priority_dims=0,
+        tied=tied,
         non_recurrent_dims=None if non_recurrent_fn is None else 0,
-        cell_fn=lambda n, i: rnn.LSTMCell(
-            num_units=n, input_size=i, forget_bias=forget_bias,
-            use_peepholes=use_peepholes, state_is_tuple=False),
-        non_recurrent_fn=non_recurrent_fn)
+        cell_fn=cell_fn,
+        non_recurrent_fn=non_recurrent_fn,
+        state_is_tuple=state_is_tuple,
+        output_is_tuple=output_is_tuple)
 
 
 class Grid3LSTMCell(GridRNNCell):
-  """3D BasicLSTM cell
+  """3D BasicLSTM cell.
 
     This creates a 2D cell which receives input and gives output in the first
     dimension.
@@ -343,19 +502,30 @@ class Grid3LSTMCell(GridRNNCell):
                tied=False,
                non_recurrent_fn=None,
                use_peepholes=False,
-               forget_bias=1.0):
+               forget_bias=1.0,
+               state_is_tuple=True,
+               output_is_tuple=True):
+
+    def cell_fn(n):
+      return rnn.LSTMCell(
+          num_units=n, forget_bias=forget_bias, use_peepholes=use_peepholes)
+
     super(Grid3LSTMCell, self).__init__(
-        num_units=num_units, num_dims=3,
-        input_dims=0, output_dims=0, priority_dims=0, tied=tied,
+        num_units=num_units,
+        num_dims=3,
+        input_dims=0,
+        output_dims=0,
+        priority_dims=0,
+        tied=tied,
         non_recurrent_dims=None if non_recurrent_fn is None else 0,
-        cell_fn=lambda n, i: rnn.LSTMCell(
-            num_units=n, input_size=i, forget_bias=forget_bias,
-            use_peepholes=use_peepholes, state_is_tuple=False),
-        non_recurrent_fn=non_recurrent_fn)
+        cell_fn=cell_fn,
+        non_recurrent_fn=non_recurrent_fn,
+        state_is_tuple=state_is_tuple,
+        output_is_tuple=output_is_tuple)
 
 
 class Grid2GRUCell(GridRNNCell):
-  """2D LSTM cell
+  """2D LSTM cell.
 
     This creates a 2D cell which receives input and gives output in the first
     dimension.
@@ -363,21 +533,31 @@ class Grid2GRUCell(GridRNNCell):
     specified.
   """
 
-  def __init__(self, num_units, tied=False, non_recurrent_fn=None):
+  def __init__(self,
+               num_units,
+               tied=False,
+               non_recurrent_fn=None,
+               state_is_tuple=True,
+               output_is_tuple=True):
     super(Grid2GRUCell, self).__init__(
-        num_units=num_units, num_dims=2,
-        input_dims=0, output_dims=0, priority_dims=0, tied=tied,
+        num_units=num_units,
+        num_dims=2,
+        input_dims=0,
+        output_dims=0,
+        priority_dims=0,
+        tied=tied,
         non_recurrent_dims=None if non_recurrent_fn is None else 0,
-        cell_fn=lambda n, i: rnn.GRUCell(num_units=n, input_size=i),
-        non_recurrent_fn=non_recurrent_fn)
+        cell_fn=lambda n: rnn.GRUCell(num_units=n),
+        non_recurrent_fn=non_recurrent_fn,
+        state_is_tuple=state_is_tuple,
+        output_is_tuple=output_is_tuple)
 
 
-"""Helpers
-"""
+# Helpers
 
-_GridRNNDimension = namedtuple(
-    '_GridRNNDimension',
-    ['idx', 'is_input', 'is_output', 'is_priority', 'non_recurrent_fn'])
+_GridRNNDimension = namedtuple('_GridRNNDimension', [
+    'idx', 'is_input', 'is_output', 'is_priority', 'non_recurrent_fn'
+])
 
 _GridRNNConfig = namedtuple('_GridRNNConfig',
                             ['num_dims', 'dims', 'inputs', 'outputs',
@@ -387,7 +567,6 @@ _GridRNNConfig = namedtuple('_GridRNNConfig',
 
 def _parse_rnn_config(num_dims, ls_input_dims, ls_output_dims, ls_priority_dims,
                       ls_non_recurrent_dims, non_recurrent_fn, tied, num_units):
-
   def check_dim_list(ls):
     if ls is None:
       ls = []
@@ -412,8 +591,8 @@ def _parse_rnn_config(num_dims, ls_input_dims, ls_output_dims, ls_priority_dims,
             is_input=(i in input_dims),
             is_output=(i in output_dims),
             is_priority=(i in priority_dims),
-            non_recurrent_fn=non_recurrent_fn if i in non_recurrent_dims else
-            None))
+            non_recurrent_fn=non_recurrent_fn
+            if i in non_recurrent_dims else None))
   return _GridRNNConfig(
       num_dims=num_dims,
       dims=rnn_dims,
@@ -440,34 +619,40 @@ def _propagate(dim_indices, conf, cells, c_prev, m_prev, new_output, new_state,
   if conf.num_dims > 1:
     ls_cell_inputs = [None] * (conf.num_dims - 1)
     for d in conf.dims[:-1]:
-      ls_cell_inputs[d.idx] = new_output[d.idx] if new_output[
-          d.idx] is not None else m_prev[d.idx]
+      if new_output[d.idx] is None:
+        ls_cell_inputs[d.idx] = m_prev[d.idx]
+      else:
+        ls_cell_inputs[d.idx] = new_output[d.idx]
     cell_inputs = array_ops.concat(ls_cell_inputs, 1)
   else:
     cell_inputs = array_ops.zeros([m_prev[0].get_shape().as_list()[0], 0],
                                   m_prev[0].dtype)
 
-  last_dim_output = new_output[-1] if new_output[-1] is not None else m_prev[-1]
+  last_dim_output = (new_output[-1]
+                     if new_output[-1] is not None else m_prev[-1])
 
   for i in dim_indices:
     d = conf.dims[i]
     if d.non_recurrent_fn:
-      linear_args = array_ops.concat(
-          [cell_inputs, last_dim_output],
-          1) if conf.num_dims > 1 else last_dim_output
+      if conf.num_dims > 1:
+        linear_args = array_ops.concat([cell_inputs, last_dim_output], 1)
+      else:
+        linear_args = last_dim_output
       with vs.variable_scope('non_recurrent' if conf.tied else
                              'non_recurrent/cell_{}'.format(i)):
         if conf.tied and not (first_call and i == dim_indices[0]):
           vs.get_variable_scope().reuse_variables()
-        new_output[d.idx] = layers.legacy_fully_connected(
+
+        new_output[d.idx] = layers.fully_connected(
             linear_args,
-            num_output_units=conf.num_units,
+            num_outputs=conf.num_units,
             activation_fn=d.non_recurrent_fn,
-            weight_init=vs.get_variable_scope().initializer or
-            layers.initializers.xavier_initializer)
+            weights_initializer=(vs.get_variable_scope().initializer or
+                                 layers.initializers.xavier_initializer),
+            weights_regularizer=vs.get_variable_scope().regularizer)
     else:
       if c_prev[i] is not None:
-        cell_state = array_ops.concat([c_prev[i], last_dim_output], 1)
+        cell_state = (c_prev[i], last_dim_output)
       else:
         # for GRU/RNN, the state is just the previous output
         cell_state = last_dim_output
diff --git a/tensorflow/contrib/image/python/kernel_tests/image_ops_test.py b/tensorflow/contrib/image/python/kernel_tests/image_ops_test.py
index 4ce33de24a..33bd30b4e8 100644
--- a/tensorflow/contrib/image/python/kernel_tests/image_ops_test.py
+++ b/tensorflow/contrib/image/python/kernel_tests/image_ops_test.py
@@ -25,6 +25,7 @@ from tensorflow.python.framework import constant_op
 from tensorflow.python.framework import dtypes
 from tensorflow.python.framework import test_util
 from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import gradient_checker
 from tensorflow.python.ops import math_ops
 from tensorflow.python.platform import googletest
 
@@ -110,6 +111,30 @@ class ImageOpsTest(test_util.TensorFlowTestCase):
                              [0, 1, 0, 1],
                              [0, 1, 1, 1]])
 
+  def _test_grad(self, shape_to_test):
+    with self.test_session():
+      test_image_shape = shape_to_test
+      test_image = np.random.randn(*test_image_shape)
+      test_image_tensor = constant_op.constant(
+          test_image, shape=test_image_shape)
+      test_transform = image_ops.angles_to_projective_transforms(
+          np.pi / 2, 4, 4)
+
+      output_shape = test_image_shape
+      output = image_ops.transform(test_image_tensor, test_transform)
+      left_err = gradient_checker.compute_gradient_error(
+          test_image_tensor,
+          test_image_shape,
+          output,
+          output_shape,
+          x_init_value=test_image)
+      self.assertLess(left_err, 1e-10)
+
+  def test_grad(self):
+    self._test_grad([16, 16])
+    self._test_grad([4, 12, 12])
+    self._test_grad([3, 4, 12, 12])
+
 
 if __name__ == "__main__":
   googletest.main()
diff --git a/tensorflow/contrib/image/python/ops/image_ops.py b/tensorflow/contrib/image/python/ops/image_ops.py
index 889f361b19..9efdb0d521 100644
--- a/tensorflow/contrib/image/python/ops/image_ops.py
+++ b/tensorflow/contrib/image/python/ops/image_ops.py
@@ -24,6 +24,7 @@ from tensorflow.python.framework import constant_op
 from tensorflow.python.framework import dtypes
 from tensorflow.python.framework import ops
 from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import linalg_ops
 from tensorflow.python.ops import math_ops
 from tensorflow.python.platform import resource_loader
 
@@ -214,4 +215,41 @@ def _transform_matrices_to_flat(transform_matrices):
   return transforms[:, :8]
 
 
-ops.NotDifferentiable("ImageProjectiveTransform")
+@ops.RegisterGradient("ImageProjectiveTransform")
+def _image_projective_transform_grad(op, grad):
+  """Computes the gradient for ImageProjectiveTransform."""
+  images = op.inputs[0]
+  transforms = op.inputs[1]
+
+  image_or_images = ops.convert_to_tensor(images, name="images")
+  transform_or_transforms = ops.convert_to_tensor(
+      transforms, name="transforms", dtype=dtypes.float32)
+
+  if image_or_images.dtype.base_dtype not in _IMAGE_DTYPES:
+    raise TypeError("Invalid dtype %s." % image_or_images.dtype)
+  if len(image_or_images.get_shape()) == 2:
+    images = image_or_images[None, :, :, None]
+  elif len(image_or_images.get_shape()) == 3:
+    images = image_or_images[None, :, :, :]
+  elif len(image_or_images.get_shape()) == 4:
+    images = image_or_images
+  else:
+    raise TypeError("Images should have rank between 2 and 4")
+  if len(transform_or_transforms.get_shape()) == 1:
+    transforms = transform_or_transforms[None]
+  elif len(transform_or_transforms.get_shape()) == 2:
+    transforms = transform_or_transforms
+  else:
+    raise TypeError("Transforms should have rank 1 or 2.")
+
+  # Invert transformations
+  transforms = _flat_transforms_to_matrices(transforms=transforms)
+  inverse = linalg_ops.matrix_inverse(transforms)
+  transforms = _transform_matrices_to_flat(inverse)
+  output = gen_image_ops.image_projective_transform(grad, transforms)
+  if len(image_or_images.get_shape()) == 2:
+    return [output[0, :, :, 0], None]
+  elif len(image_or_images.get_shape()) == 3:
+    return [output[0, :, :, :], None]
+  else:
+    return [output, None]
diff --git a/tensorflow/contrib/ios_examples/camera/CameraExampleViewController.mm b/tensorflow/contrib/ios_examples/camera/CameraExampleViewController.mm
index 20c49d5b6a..27df3d3d71 100644
--- a/tensorflow/contrib/ios_examples/camera/CameraExampleViewController.mm
+++ b/tensorflow/contrib/ios_examples/camera/CameraExampleViewController.mm
@@ -323,10 +323,10 @@ didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
       auto predictions = output->flat<float>();
 
       NSMutableDictionary *newValues = [NSMutableDictionary dictionary];
-      for (int index = 0; index < predictions.size(); index += 1) {
+      for (int index = 0; index < predictions.size(); ++index) {
         const float predictionValue = predictions(index);
         if (predictionValue > 0.05f) {
-          std::string label = labels[index % predictions.size()];
+          std::string label = labels[index];
           NSString *labelObject = [NSString stringWithUTF8String:label.c_str()];
           NSNumber *valueObject = [NSNumber numberWithFloat:predictionValue];
           [newValues setObject:valueObject forKey:labelObject];
diff --git a/tensorflow/contrib/keras/python/keras/initializers_test.py b/tensorflow/contrib/keras/python/keras/initializers_test.py
index c9f50c28ea..0a07eddd89 100644
--- a/tensorflow/contrib/keras/python/keras/initializers_test.py
+++ b/tensorflow/contrib/keras/python/keras/initializers_test.py
@@ -120,7 +120,7 @@ class KerasInitializersTest(test.TestCase):
                    target_mean=0., target_std=None, target_max=2 * scale)
 
   def test_orthogonal(self):
-    tensor_shape = (10, 10)
+    tensor_shape = (20, 20)
     with self.test_session():
       self._runner(keras.initializers.orthogonal(seed=123), tensor_shape,
                    target_mean=0.)
diff --git a/tensorflow/contrib/layers/python/layers/encoders.py b/tensorflow/contrib/layers/python/layers/encoders.py
index 8b6abb4b45..89c9d37bd0 100644
--- a/tensorflow/contrib/layers/python/layers/encoders.py
+++ b/tensorflow/contrib/layers/python/layers/encoders.py
@@ -121,7 +121,7 @@ def embed_sequence(ids,
     `Tensor` of `[batch_size, doc_length, embed_dim]` with embedded sequences.
 
   Raises:
-    ValueError: if `embed_dim` or `vocab_size` are not specified when not
+    ValueError: if `embed_dim` or `vocab_size` are not specified when
       `reuse` is `None` or `False`.
   """
   if not (reuse or (vocab_size and embed_dim)):
diff --git a/tensorflow/contrib/layers/python/layers/initializers.py b/tensorflow/contrib/layers/python/layers/initializers.py
index 811e7fa7aa..271b3c01ff 100644
--- a/tensorflow/contrib/layers/python/layers/initializers.py
+++ b/tensorflow/contrib/layers/python/layers/initializers.py
@@ -34,9 +34,10 @@ def xavier_initializer(uniform=True, seed=None, dtype=dtypes.float32):
   This function implements the weight initialization from:
 
   Xavier Glorot and Yoshua Bengio (2010):
-           Understanding the difficulty of training deep feedforward neural
+           [Understanding the difficulty of training deep feedforward neural
            networks. International conference on artificial intelligence and
-           statistics.
+           statistics.](
+           http://www.jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf)
 
   This initializer is designed to keep the scale of the gradients roughly the
   same in all layers. In uniform distribution this ends up being the range:
diff --git a/tensorflow/contrib/learn/__init__.py b/tensorflow/contrib/learn/__init__.py
index bd56066b1b..05c4024d0b 100644
--- a/tensorflow/contrib/learn/__init__.py
+++ b/tensorflow/contrib/learn/__init__.py
@@ -38,6 +38,7 @@ See the @{$python/contrib.learn} guide.
 @@LinearEstimator
 @@LinearRegressor
 @@LogisticRegressor
+@@StateSavingRnnEstimator
 @@SVM
 @@SKCompat
 
diff --git a/tensorflow/contrib/learn/python/learn/estimators/dnn_linear_combined.py b/tensorflow/contrib/learn/python/learn/estimators/dnn_linear_combined.py
index 0ff5d6e8dc..53c71c6f3e 100644
--- a/tensorflow/contrib/learn/python/learn/estimators/dnn_linear_combined.py
+++ b/tensorflow/contrib/learn/python/learn/estimators/dnn_linear_combined.py
@@ -35,11 +35,11 @@ from tensorflow.contrib.learn.python.learn.estimators import prediction_key
 from tensorflow.contrib.learn.python.learn.utils import export
 from tensorflow.python.framework import ops
 from tensorflow.python.ops import control_flow_ops
-from tensorflow.python.ops import logging_ops
 from tensorflow.python.ops import nn
 from tensorflow.python.ops import partitioned_variables
 from tensorflow.python.ops import state_ops
 from tensorflow.python.ops import variable_scope
+from tensorflow.python.summary import summary
 from tensorflow.python.training import sync_replicas_optimizer
 from tensorflow.python.training import training_util
 
@@ -99,10 +99,14 @@ def _linear_learning_rate(num_linear_feature_columns):
   return min(_LINEAR_LEARNING_RATE, default_learning_rate)
 
 
+def _add_hidden_layer_summary(value, tag):
+  summary.scalar("%s/fraction_of_zero_values" % tag, nn.zero_fraction(value))
+  summary.histogram("%s/activation" % tag, value)
+
+
 def _add_layer_summary(value, tag):
-  logging_ops.scalar_summary("%s/fraction_of_zero_values" % tag,
-                             nn.zero_fraction(value))
-  logging_ops.histogram_summary("%s/activation" % tag, value)
+  summary.scalar("%s/fraction_of_zero_values" % tag, nn.zero_fraction(value))
+  summary.histogram("%s/activation" % tag, value)
 
 
 def _get_embedding_variable(column, collection_key, input_layer_scope):
diff --git a/tensorflow/contrib/learn/python/learn/estimators/dynamic_rnn_estimator.py b/tensorflow/contrib/learn/python/learn/estimators/dynamic_rnn_estimator.py
index 525f84d511..d86ef8d477 100644
--- a/tensorflow/contrib/learn/python/learn/estimators/dynamic_rnn_estimator.py
+++ b/tensorflow/contrib/learn/python/learn/estimators/dynamic_rnn_estimator.py
@@ -19,7 +19,6 @@ from __future__ import division
 from __future__ import print_function
 
 from tensorflow.contrib import layers
-from tensorflow.contrib.framework.python.framework import deprecated
 from tensorflow.contrib.layers.python.layers import optimizers
 from tensorflow.contrib.learn.python.learn.estimators import constants
 from tensorflow.contrib.learn.python.learn.estimators import estimator
@@ -540,20 +539,6 @@ def _get_dynamic_rnn_model_fn(
   return _dynamic_rnn_model_fn
 
 
-def _get_dropout_and_num_units(num_units,
-                               num_rnn_layers,
-                               input_keep_probability,
-                               output_keep_probability):
-  """Helper function for deprecated factory functions."""
-  dropout_keep_probabilities = None
-  num_units = [num_units for _ in range(num_rnn_layers)]
-  if input_keep_probability or output_keep_probability:
-    dropout_keep_probabilities = ([input_keep_probability]
-                                  + [1.0] * (num_rnn_layers - 1)
-                                  + [output_keep_probability])
-  return dropout_keep_probabilities, num_units
-
-
 class DynamicRnnEstimator(estimator.Estimator):
 
   def __init__(self,
@@ -704,339 +689,3 @@ class DynamicRnnEstimator(estimator.Estimator):
         model_dir=model_dir,
         config=config,
         feature_engineering_fn=feature_engineering_fn)
-
-
-@deprecated('2017-04-01',
-            'multi_value_rnn_regressor is deprecated. '
-            'Please construct a DynamicRnnEstimator directly.')
-def multi_value_rnn_regressor(num_units,
-                              sequence_feature_columns,
-                              context_feature_columns=None,
-                              cell_type='basic_rnn',
-                              num_rnn_layers=1,
-                              optimizer_type='SGD',
-                              learning_rate=0.1,
-                              momentum=None,
-                              gradient_clipping_norm=5.0,
-                              input_keep_probability=None,
-                              output_keep_probability=None,
-                              model_dir=None,
-                              config=None,
-                              feature_engineering_fn=None):
-  """Creates a `DynamicRnnEstimator` for multi-value regression.
-
-  Returns an `Estimator` that given input sequences, processes them in a dynamic
-  recurrent network and outputs a sequence of continuous values.
-
-  Args:
-    num_units: The size of the RNN cells.
-    sequence_feature_columns: An iterable containing all the feature columns
-      describing sequence features. All items in the set should be instances
-      of classes derived from `FeatureColumn`.
-    context_feature_columns: An iterable containing all the feature columns
-      describing context features, i.e., features that apply accross all time
-      steps. All items in the set should be instances of classes derived from
-      `FeatureColumn`.
-    cell_type: A subclass of `RNNCell` or one of 'basic_rnn,' 'lstm' or 'gru'.
-    num_rnn_layers: Number of RNN layers. Leave this at its default value 1
-      if passing a `cell_type` that is already a MultiRNNCell.
-    optimizer_type: The type of optimizer to use. Either a subclass of
-      `Optimizer`, an instance of an `Optimizer`, a callback that returns an
-      optimizer, or a string. Strings must be one of 'Adagrad', 'Adam',
-      'Ftrl', 'Momentum', 'RMSProp' or 'SGD. See `layers.optimize_loss` for
-      more details.
-    learning_rate: Learning rate. This argument has no effect if `optimizer`
-      is an instance of an `Optimizer`.
-    momentum: Momentum value. Only used if `optimizer_type` is 'Momentum'.
-    gradient_clipping_norm: Parameter used for gradient clipping. If `None`,
-      then no clipping is performed.
-    input_keep_probability: Probability to keep inputs to `cell`. If `None`,
-      no dropout is applied.
-    output_keep_probability: Probability to keep outputs of `cell`. If `None`,
-      no dropout is applied.
-    model_dir: The directory in which to save and restore the model graph,
-      parameters, etc.
-    config: A `RunConfig` instance.
-    feature_engineering_fn: Takes features and labels which are the output of
-      `input_fn` and returns features and labels which will be fed into
-      `model_fn`. Please check `model_fn` for a definition of features and
-      labels.
-  Returns:
-    An initialized `Estimator`.
-  """
-  dropout_keep_probabilities, num_units = _get_dropout_and_num_units(
-      num_units,
-      num_rnn_layers,
-      input_keep_probability,
-      output_keep_probability)
-  return DynamicRnnEstimator(
-      problem_type=constants.ProblemType.LINEAR_REGRESSION,
-      prediction_type=rnn_common.PredictionType.MULTIPLE_VALUE,
-      sequence_feature_columns=sequence_feature_columns,
-      context_feature_columns=context_feature_columns,
-      num_units=num_units,
-      cell_type=cell_type,
-      optimizer=optimizer_type,
-      learning_rate=learning_rate,
-      momentum=momentum,
-      gradient_clipping_norm=gradient_clipping_norm,
-      dropout_keep_probabilities=dropout_keep_probabilities,
-      model_dir=model_dir,
-      feature_engineering_fn=feature_engineering_fn,
-      config=config)
-
-
-@deprecated('2017-04-01',
-            'multi_value_rnn_classifier is deprecated. '
-            'Please construct a DynamicRNNEstimator directly.')
-def multi_value_rnn_classifier(num_classes,
-                               num_units,
-                               sequence_feature_columns,
-                               context_feature_columns=None,
-                               cell_type='basic_rnn',
-                               num_rnn_layers=1,
-                               optimizer_type='SGD',
-                               learning_rate=0.1,
-                               predict_probabilities=False,
-                               momentum=None,
-                               gradient_clipping_norm=5.0,
-                               input_keep_probability=None,
-                               output_keep_probability=None,
-                               model_dir=None,
-                               config=None,
-                               feature_engineering_fn=None):
-  """Creates a `DynamicRNNEstimator` for multi-value classification.
-
-  Returns an `Estimator` that given input sequences, processes them in a dynamic
-  recurrent network and outputs a sequence of classifications, along with
-  (optionally) a probability distribution over classes.
-
-  Args:
-    num_classes: The number of classes for categorization.
-    num_units: The size of the RNN cells.
-    sequence_feature_columns: An iterable containing all the feature columns
-      describing sequence features. All items in the set should be instances
-      of classes derived from `FeatureColumn`.
-    context_feature_columns: An iterable containing all the feature columns
-      describing context features, i.e., features that apply accross all time
-      steps. All items in the set should be instances of classes derived from
-      `FeatureColumn`.
-    cell_type: A subclass of `RNNCell` or one of 'basic_rnn,' 'lstm' or 'gru'.
-    num_rnn_layers: Number of RNN layers. Leave this at its default value 1
-      if passing a `cell_type` that is already a MultiRNNCell.
-    optimizer_type: The type of optimizer to use. Either a subclass of
-      `Optimizer`, an instance of an `Optimizer`, a callback that returns an
-      optimizer, or a string. Strings must be one of 'Adagrad', 'Adam',
-      'Ftrl', 'Momentum', 'RMSProp' or 'SGD. See `layers.optimize_loss` for
-      more details.
-    learning_rate: Learning rate. This argument has no effect if `optimizer`
-      is an instance of an `Optimizer`.
-    predict_probabilities: A boolean indicating whether to predict probabilities
-      for all classes.
-    momentum: Momentum value. Only used if `optimizer_type` is 'Momentum'.
-    gradient_clipping_norm: Parameter used for gradient clipping. If `None`,
-      then no clipping is performed.
-    input_keep_probability: Probability to keep inputs to `cell`. If `None`,
-      no dropout is applied.
-    output_keep_probability: Probability to keep outputs of `cell`. If `None`,
-      no dropout is applied.
-    model_dir: The directory in which to save and restore the model graph,
-      parameters, etc.
-    config: A `RunConfig` instance.
-    feature_engineering_fn: Takes features and labels which are the output of
-      `input_fn` and returns features and labels which will be fed into
-      `model_fn`. Please check `model_fn` for a definition of features and
-      labels.
-  Returns:
-    An initialized `Estimator`.
-  """
-  dropout_keep_probabilities, num_units = _get_dropout_and_num_units(
-      num_units,
-      num_rnn_layers,
-      input_keep_probability,
-      output_keep_probability)
-  return DynamicRnnEstimator(
-      problem_type=constants.ProblemType.CLASSIFICATION,
-      prediction_type=rnn_common.PredictionType.MULTIPLE_VALUE,
-      num_classes=num_classes,
-      sequence_feature_columns=sequence_feature_columns,
-      context_feature_columns=context_feature_columns,
-      num_units=num_units,
-      cell_type=cell_type,
-      optimizer=optimizer_type,
-      learning_rate=learning_rate,
-      predict_probabilities=predict_probabilities,
-      momentum=momentum,
-      gradient_clipping_norm=gradient_clipping_norm,
-      dropout_keep_probabilities=dropout_keep_probabilities,
-      model_dir=model_dir,
-      feature_engineering_fn=feature_engineering_fn,
-      config=config)
-
-
-@deprecated('2017-04-01',
-            'single_value_rnn_regressor is deprecated. '
-            'Please construct a DynamicRnnEstimator directly.')
-def single_value_rnn_regressor(num_units,
-                               sequence_feature_columns,
-                               context_feature_columns=None,
-                               cell_type='basic_rnn',
-                               num_rnn_layers=1,
-                               optimizer_type='SGD',
-                               learning_rate=0.1,
-                               momentum=None,
-                               gradient_clipping_norm=5.0,
-                               input_keep_probability=None,
-                               output_keep_probability=None,
-                               model_dir=None,
-                               config=None,
-                               feature_engineering_fn=None):
-  """Creates a `DynamicRnnEstimator` for single-value regression.
-
-  Returns an `Estimator` that given input sequences, processes them in a dynamic
-  recurrent network and outputs a single continuous values.
-
-  Args:
-    num_units: The size of the RNN cells.
-    sequence_feature_columns: An iterable containing all the feature columns
-      describing sequence features. All items in the set should be instances
-      of classes derived from `FeatureColumn`.
-    context_feature_columns: An iterable containing all the feature columns
-      describing context features, i.e., features that apply accross all time
-      steps. All items in the set should be instances of classes derived from
-      `FeatureColumn`.
-    cell_type: A subclass of `RNNCell` or one of 'basic_rnn,' 'lstm' or 'gru'.
-    num_rnn_layers: Number of RNN layers. Leave this at its default value 1
-      if passing a `cell_type` that is already a MultiRNNCell.
-    optimizer_type: The type of optimizer to use. Either a subclass of
-      `Optimizer`, an instance of an `Optimizer`, a callback that returns an
-      optimizer, or a string. Strings must be one of 'Adagrad', 'Adam',
-      'Ftrl', 'Momentum', 'RMSProp' or 'SGD. See `layers.optimize_loss` for
-      more details.
-    learning_rate: Learning rate. This argument has no effect if `optimizer`
-      is an instance of an `Optimizer`.
-    momentum: Momentum value. Only used if `optimizer_type` is 'Momentum'.
-    gradient_clipping_norm: Parameter used for gradient clipping. If `None`,
-      then no clipping is performed.
-    input_keep_probability: Probability to keep inputs to `cell`. If `None`,
-      no dropout is applied.
-    output_keep_probability: Probability to keep outputs of `cell`. If `None`,
-      no dropout is applied.
-    model_dir: The directory in which to save and restore the model graph,
-      parameters, etc.
-    config: A `RunConfig` instance.
-    feature_engineering_fn: Takes features and labels which are the output of
-      `input_fn` and returns features and labels which will be fed into
-      `model_fn`. Please check `model_fn` for a definition of features and
-      labels.
-  Returns:
-    An initialized `Estimator`.
-  """
-  dropout_keep_probabilities, num_units = _get_dropout_and_num_units(
-      num_units,
-      num_rnn_layers,
-      input_keep_probability,
-      output_keep_probability)
-  return DynamicRnnEstimator(
-      problem_type=constants.ProblemType.LINEAR_REGRESSION,
-      prediction_type=rnn_common.PredictionType.SINGLE_VALUE,
-      sequence_feature_columns=sequence_feature_columns,
-      context_feature_columns=context_feature_columns,
-      num_units=num_units,
-      cell_type=cell_type,
-      optimizer=optimizer_type,
-      learning_rate=learning_rate,
-      momentum=momentum,
-      gradient_clipping_norm=gradient_clipping_norm,
-      dropout_keep_probabilities=dropout_keep_probabilities,
-      model_dir=model_dir,
-      feature_engineering_fn=feature_engineering_fn,
-      config=config)
-
-
-@deprecated('2017-04-01',
-            'single_value_rnn_classifier is deprecated. '
-            'Please construct a DynamicRnnEstimator directly.')
-def single_value_rnn_classifier(num_classes,
-                                num_units,
-                                sequence_feature_columns,
-                                context_feature_columns=None,
-                                cell_type='basic_rnn',
-                                num_rnn_layers=1,
-                                optimizer_type='SGD',
-                                learning_rate=0.1,
-                                predict_probabilities=False,
-                                momentum=None,
-                                gradient_clipping_norm=5.0,
-                                input_keep_probability=None,
-                                output_keep_probability=None,
-                                model_dir=None,
-                                config=None,
-                                feature_engineering_fn=None):
-  """Creates a `DynamicRnnEstimator` for single-value classification.
-
-  Returns an `Estimator` that given input sequences, processes them in a dynamic
-  recurrent network and outputs a single classifications, along with
-  (optionally) a probability distribution over classes.
-
-  Args:
-    num_classes: The number of classes for categorization.
-    num_units: The size of the RNN cells.
-    sequence_feature_columns: An iterable containing all the feature columns
-      describing sequence features. All items in the set should be instances
-      of classes derived from `FeatureColumn`.
-    context_feature_columns: An iterable containing all the feature columns
-      describing context features, i.e., features that apply accross all time
-      steps. All items in the set should be instances of classes derived from
-      `FeatureColumn`.
-    cell_type: A subclass of `RNNCell` or one of 'basic_rnn,' 'lstm' or 'gru'.
-    num_rnn_layers: Number of RNN layers. Leave this at its default value 1
-      if passing a `cell_type` that is already a MultiRNNCell.
-    optimizer_type: The type of optimizer to use. Either a subclass of
-      `Optimizer`, an instance of an `Optimizer`, a callback that returns an
-      optimizer, or a string. Strings must be one of 'Adagrad', 'Adam',
-      'Ftrl', 'Momentum', 'RMSProp' or 'SGD. See `layers.optimize_loss` for
-      more details.
-    learning_rate: Learning rate. This argument has no effect if `optimizer`
-      is an instance of an `Optimizer`.
-    predict_probabilities: A boolean indicating whether to predict probabilities
-      for all classes.
-    momentum: Momentum value. Only used if `optimizer_type` is 'Momentum'.
-    gradient_clipping_norm: Parameter used for gradient clipping. If `None`,
-      then no clipping is performed.
-    input_keep_probability: Probability to keep inputs to `cell`. If `None`,
-      no dropout is applied.
-    output_keep_probability: Probability to keep outputs of `cell`. If `None`,
-      no dropout is applied.
-    model_dir: The directory in which to save and restore the model graph,
-      parameters, etc.
-    config: A `RunConfig` instance.
-    feature_engineering_fn: Takes features and labels which are the output of
-      `input_fn` and returns features and labels which will be fed into
-      `model_fn`. Please check `model_fn` for a definition of features and
-      labels.
-  Returns:
-    An initialized `Estimator`.
-  """
-  dropout_keep_probabilities, num_units = _get_dropout_and_num_units(
-      num_units,
-      num_rnn_layers,
-      input_keep_probability,
-      output_keep_probability)
-  return DynamicRnnEstimator(
-      problem_type=constants.ProblemType.CLASSIFICATION,
-      prediction_type=rnn_common.PredictionType.SINGLE_VALUE,
-      num_classes=num_classes,
-      sequence_feature_columns=sequence_feature_columns,
-      context_feature_columns=context_feature_columns,
-      num_units=num_units,
-      cell_type=cell_type,
-      optimizer=optimizer_type,
-      learning_rate=learning_rate,
-      predict_probabilities=predict_probabilities,
-      momentum=momentum,
-      gradient_clipping_norm=gradient_clipping_norm,
-      dropout_keep_probabilities=dropout_keep_probabilities,
-      model_dir=model_dir,
-      feature_engineering_fn=feature_engineering_fn,
-      config=config)
diff --git a/tensorflow/contrib/learn/python/learn/estimators/dynamic_rnn_estimator_test.py b/tensorflow/contrib/learn/python/learn/estimators/dynamic_rnn_estimator_test.py
index 58072500d1..6fc028ab70 100644
--- a/tensorflow/contrib/learn/python/learn/estimators/dynamic_rnn_estimator_test.py
+++ b/tensorflow/contrib/learn/python/learn/estimators/dynamic_rnn_estimator_test.py
@@ -410,56 +410,6 @@ class DynamicRnnEstimatorTest(test.TestCase):
       state_piece = prediction_dict[dynamic_rnn_estimator._get_state_name(i)]
       self.assertListEqual(list(state_piece.shape), [batch_size, state_size])
 
-  def testLegacyConstructor(self):
-    """Exercise legacy constructor function."""
-    num_units = 16
-    num_layers = 6
-    output_keep_prob = 0.9
-    input_keep_prob = 0.7
-    batch_size = 11
-    learning_rate = 0.1
-    train_sequence_length = 21
-    train_steps = 121
-
-    def get_input_fn(batch_size, sequence_length, state_dict, starting_step=0):
-
-      def input_fn():
-        sequence = constant_op.constant(
-            [[(starting_step + i + j) % 2 for j in range(sequence_length + 1)]
-             for i in range(batch_size)],
-            dtype=dtypes.int32)
-        labels = array_ops.slice(sequence, [0, 0],
-                                 [batch_size, sequence_length])
-        inputs = array_ops.expand_dims(
-            math_ops.to_float(
-                array_ops.slice(sequence, [0, 1], [batch_size, sequence_length
-                                                  ])), 2)
-        input_dict = state_dict
-        input_dict['inputs'] = inputs
-        return input_dict, labels
-
-      return input_fn
-
-    seq_columns = [feature_column.real_valued_column('inputs', dimension=1)]
-    config = run_config.RunConfig(tf_random_seed=21212)
-
-    model_dir = tempfile.mkdtemp()
-    sequence_estimator = dynamic_rnn_estimator.multi_value_rnn_classifier(
-        num_classes=2,
-        num_units=num_units,
-        num_rnn_layers=num_layers,
-        input_keep_probability=input_keep_prob,
-        output_keep_probability=output_keep_prob,
-        sequence_feature_columns=seq_columns,
-        learning_rate=learning_rate,
-        config=config,
-        model_dir=model_dir)
-
-    train_input_fn = get_input_fn(
-        batch_size, train_sequence_length, state_dict={})
-
-    sequence_estimator.fit(input_fn=train_input_fn, steps=train_steps)
-
   def testMultipleRuns(self):
     """Tests resuming training by feeding state."""
     cell_sizes = [4, 7]
diff --git a/tensorflow/contrib/learn/python/learn/estimators/kmeans.py b/tensorflow/contrib/learn/python/learn/estimators/kmeans.py
index a0f501dfba..5be07d271d 100644
--- a/tensorflow/contrib/learn/python/learn/estimators/kmeans.py
+++ b/tensorflow/contrib/learn/python/learn/estimators/kmeans.py
@@ -27,11 +27,11 @@ from tensorflow.contrib.learn.python.learn.estimators import estimator
 from tensorflow.contrib.learn.python.learn.estimators.model_fn import ModelFnOps
 from tensorflow.python.framework import ops
 from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import logging_ops
 from tensorflow.python.ops import math_ops
 from tensorflow.python.ops import state_ops
 from tensorflow.python.ops.control_flow_ops import with_dependencies
 from tensorflow.python.platform import tf_logging as logging
+from tensorflow.python.summary import summary
 from tensorflow.python.training import session_run_hook
 from tensorflow.python.training.session_run_hook import SessionRunArgs
 
@@ -118,7 +118,7 @@ def _kmeans_clustering_model_fn(features, labels, mode, params, config):
            'kmeans_plus_plus_num_retries')).training_graph()
   incr_step = state_ops.assign_add(variables.get_global_step(), 1)
   loss = math_ops.reduce_sum(losses, name=KMeansClustering.LOSS_OP_NAME)
-  logging_ops.scalar_summary('loss/raw', loss)
+  summary.scalar('loss/raw', loss)
   training_op = with_dependencies([training_op, incr_step], loss)
   predictions = {
       KMeansClustering.ALL_SCORES: all_scores[0],
@@ -257,4 +257,3 @@ class KMeansClustering(estimator.Estimator):
   def clusters(self):
     """Returns cluster centers."""
     return super(KMeansClustering, self).get_variable_value(self.CLUSTERS)
-
diff --git a/tensorflow/contrib/learn/python/learn/estimators/state_saving_rnn_estimator.py b/tensorflow/contrib/learn/python/learn/estimators/state_saving_rnn_estimator.py
index e09278bc63..02acd70812 100644
--- a/tensorflow/contrib/learn/python/learn/estimators/state_saving_rnn_estimator.py
+++ b/tensorflow/contrib/learn/python/learn/estimators/state_saving_rnn_estimator.py
@@ -20,7 +20,6 @@ from __future__ import print_function
 
 from tensorflow.contrib import layers
 from tensorflow.contrib import rnn as rnn_cell
-from tensorflow.contrib.framework.python.framework import deprecated
 from tensorflow.contrib.layers.python.layers import feature_column_ops
 from tensorflow.contrib.layers.python.layers import optimizers
 from tensorflow.contrib.learn.python.learn.estimators import constants
@@ -652,180 +651,3 @@ class StateSavingRnnEstimator(estimator.Estimator):
         model_dir=model_dir,
         config=config,
         feature_engineering_fn=feature_engineering_fn)
-
-
-@deprecated('2017-04-01', 'multi_value_rnn_regressor is deprecated. '
-            'Please construct a StateSavingRnnEstimator directly.')
-def multi_value_rnn_regressor(num_units,
-                              num_unroll,
-                              batch_size,
-                              sequence_feature_columns,
-                              context_feature_columns=None,
-                              num_rnn_layers=1,
-                              optimizer_type='SGD',
-                              learning_rate=0.1,
-                              momentum=None,
-                              gradient_clipping_norm=5.0,
-                              dropout_keep_probabilities=None,
-                              model_dir=None,
-                              config=None,
-                              feature_engineering_fn=None,
-                              num_threads=3,
-                              queue_capacity=1000,
-                              seed=None):
-  """Creates a RNN `Estimator` that predicts sequences of values.
-
-  Args:
-    num_units: The size of the RNN cells.
-    num_unroll: Python integer, how many time steps to unroll at a time.
-      The input sequences of length `k` are then split into `k / num_unroll`
-      many segments.
-    batch_size: Python integer, the size of the minibatch.
-    sequence_feature_columns: An iterable containing all the feature columns
-      describing sequence features. All items in the set should be instances
-      of classes derived from `FeatureColumn`.
-    context_feature_columns: An iterable containing all the feature columns
-      describing context features, i.e., features that apply accross all time
-      steps. All items in the set should be instances of classes derived from
-      `FeatureColumn`.
-    num_rnn_layers: Number of RNN layers.
-    optimizer_type: The type of optimizer to use. Either a subclass of
-      `Optimizer`, an instance of an `Optimizer` or a string. Strings must be
-      one of 'Adagrad', 'Momentum' or 'SGD'.
-    learning_rate: Learning rate. This argument has no effect if `optimizer`
-      is an instance of an `Optimizer`.
-    momentum: Momentum value. Only used if `optimizer_type` is 'Momentum'.
-    gradient_clipping_norm: Parameter used for gradient clipping. If `None`,
-      then no clipping is performed.
-    dropout_keep_probabilities: a list of dropout keep probabilities or `None`.
-        If given a list, it must have length `num_rnn_layers + 1`.
-    model_dir: The directory in which to save and restore the model graph,
-      parameters, etc.
-    config: A `RunConfig` instance.
-    feature_engineering_fn: Takes features and labels which are the output of
-      `input_fn` and returns features and labels which will be fed into
-      `model_fn`. Please check `model_fn` for a definition of features and
-      labels.
-    num_threads: The Python integer number of threads enqueuing input examples
-      into a queue. Defaults to 3.
-    queue_capacity: The max capacity of the queue in number of examples.
-      Needs to be at least `batch_size`. Defaults to 1000. When iterating
-      over the same input example multiple times reusing their keys the
-      `queue_capacity` must be smaller than the number of examples.
-    seed: Fixes the random seed used for generating input keys by the SQSS.
-  Returns:
-    An initialized `Estimator`.
-  """
-  num_units = [num_units for _ in range(num_rnn_layers)]
-  return StateSavingRnnEstimator(
-      constants.ProblemType.LINEAR_REGRESSION,
-      num_unroll,
-      batch_size,
-      sequence_feature_columns,
-      context_feature_columns=context_feature_columns,
-      num_classes=None,
-      num_units=num_units,
-      cell_type='lstm',
-      optimizer_type=optimizer_type,
-      learning_rate=learning_rate,
-      predict_probabilities=False,
-      momentum=momentum,
-      gradient_clipping_norm=gradient_clipping_norm,
-      dropout_keep_probabilities=dropout_keep_probabilities,
-      model_dir=model_dir,
-      config=config,
-      feature_engineering_fn=feature_engineering_fn,
-      num_threads=num_threads,
-      queue_capacity=queue_capacity,
-      seed=seed)
-
-
-@deprecated('2017-04-01', 'multi_value_rnn_classifier is deprecated. '
-            'Please construct a StateSavingRnnEstimator directly.')
-def multi_value_rnn_classifier(num_classes,
-                               num_units,
-                               num_unroll,
-                               batch_size,
-                               sequence_feature_columns,
-                               context_feature_columns=None,
-                               num_rnn_layers=1,
-                               optimizer_type='SGD',
-                               learning_rate=0.1,
-                               predict_probabilities=False,
-                               momentum=None,
-                               gradient_clipping_norm=5.0,
-                               dropout_keep_probabilities=None,
-                               model_dir=None,
-                               config=None,
-                               feature_engineering_fn=None,
-                               num_threads=3,
-                               queue_capacity=1000,
-                               seed=None):
-  """Creates a RNN `Estimator` that predicts sequences of labels.
-
-  Args:
-    num_classes: The number of classes for categorization.
-    num_units: The size of the RNN cells.
-    num_unroll: Python integer, how many time steps to unroll at a time.
-      The input sequences of length `k` are then split into `k / num_unroll`
-      many segments.
-    batch_size: Python integer, the size of the minibatch.
-    sequence_feature_columns: An iterable containing all the feature columns
-      describing sequence features. All items in the set should be instances
-      of classes derived from `FeatureColumn`.
-    context_feature_columns: An iterable containing all the feature columns
-      describing context features, i.e., features that apply accross all time
-      steps. All items in the set should be instances of classes derived from
-      `FeatureColumn`.
-    num_rnn_layers: Number of RNN layers.
-    optimizer_type: The type of optimizer to use. Either a subclass of
-      `Optimizer`, an instance of an `Optimizer` or a string. Strings must be
-      one of 'Adagrad', 'Momentum' or 'SGD'.
-    learning_rate: Learning rate. This argument has no effect if `optimizer`
-      is an instance of an `Optimizer`.
-    predict_probabilities: A boolean indicating whether to predict probabilities
-      for all classes.
-    momentum: Momentum value. Only used if `optimizer_type` is 'Momentum'.
-    gradient_clipping_norm: Parameter used for gradient clipping. If `None`,
-      then no clipping is performed.
-    dropout_keep_probabilities: a list of dropout keep probabilities or `None`.
-        If given a list, it must have length `num_rnn_layers + 1`.
-    model_dir: The directory in which to save and restore the model graph,
-      parameters, etc.
-    config: A `RunConfig` instance.
-    feature_engineering_fn: Takes features and labels which are the output of
-      `input_fn` and returns features and labels which will be fed into
-      `model_fn`. Please check `model_fn` for a definition of features and
-      labels.
-    num_threads: The Python integer number of threads enqueuing input examples
-      into a queue. Defaults to 3.
-    queue_capacity: The max capacity of the queue in number of examples.
-      Needs to be at least `batch_size`. Defaults to 1000. When iterating
-      over the same input example multiple times reusing their keys the
-      `queue_capacity` must be smaller than the number of examples.
-    seed: Fixes the random seed used for generating input keys by the SQSS.
-  Returns:
-    An initialized `Estimator`.
-  """
-  num_units = [num_units for _ in range(num_rnn_layers)]
-  return StateSavingRnnEstimator(
-      constants.ProblemType.CLASSIFICATION,
-      num_unroll,
-      batch_size,
-      sequence_feature_columns,
-      context_feature_columns=context_feature_columns,
-      num_classes=num_classes,
-      num_units=num_units,
-      cell_type='lstm',
-      optimizer_type=optimizer_type,
-      learning_rate=learning_rate,
-      predict_probabilities=predict_probabilities,
-      momentum=momentum,
-      gradient_clipping_norm=gradient_clipping_norm,
-      dropout_keep_probabilities=dropout_keep_probabilities,
-      model_dir=model_dir,
-      config=config,
-      feature_engineering_fn=feature_engineering_fn,
-      num_threads=num_threads,
-      queue_capacity=queue_capacity,
-      seed=seed)
diff --git a/tensorflow/contrib/learn/python/learn/estimators/state_saving_rnn_estimator_test.py b/tensorflow/contrib/learn/python/learn/estimators/state_saving_rnn_estimator_test.py
index feea6c5fed..886aab5b4f 100644
--- a/tensorflow/contrib/learn/python/learn/estimators/state_saving_rnn_estimator_test.py
+++ b/tensorflow/contrib/learn/python/learn/estimators/state_saving_rnn_estimator_test.py
@@ -455,56 +455,6 @@ class LegacyConstructorTest(test.TestCase):
       return {'inputs': inputs}, labels
     return input_fn
 
-  def testClassifierConstructor(self):
-    batch_size = 16
-    num_classes = 2
-    num_unroll = 32
-    sequence_length = 32
-    num_units = 4
-    learning_rate = 0.5
-    steps = 100
-    input_fn = self._get_input_fn(sequence_length,
-                                  seed=1234)
-    model_dir = tempfile.mkdtemp()
-    seq_columns = [
-        feature_column.real_valued_column(
-            'inputs', dimension=num_units)
-    ]
-    estimator = ssre.multi_value_rnn_classifier(num_classes,
-                                                num_units,
-                                                num_unroll,
-                                                batch_size,
-                                                seq_columns,
-                                                learning_rate=learning_rate,
-                                                model_dir=model_dir,
-                                                queue_capacity=batch_size+2,
-                                                seed=1234)
-    estimator.fit(input_fn=input_fn, steps=steps)
-
-  def testRegressorConstructor(self):
-    batch_size = 16
-    num_unroll = 32
-    sequence_length = 32
-    num_units = 4
-    learning_rate = 0.5
-    steps = 100
-    input_fn = self._get_input_fn(sequence_length,
-                                  seed=4321)
-    model_dir = tempfile.mkdtemp()
-    seq_columns = [
-        feature_column.real_valued_column(
-            'inputs', dimension=num_units)
-    ]
-    estimator = ssre.multi_value_rnn_regressor(num_units,
-                                               num_unroll,
-                                               batch_size,
-                                               seq_columns,
-                                               learning_rate=learning_rate,
-                                               model_dir=model_dir,
-                                               queue_capacity=batch_size+2,
-                                               seed=1234)
-    estimator.fit(input_fn=input_fn, steps=steps)
-
 
 # TODO(jtbates): move all tests below to a benchmark test.
 class StateSavingRNNEstimatorLearningTest(test.TestCase):
diff --git a/tensorflow/contrib/rnn/__init__.py b/tensorflow/contrib/rnn/__init__.py
index a744878124..d2f9cde4f6 100644
--- a/tensorflow/contrib/rnn/__init__.py
+++ b/tensorflow/contrib/rnn/__init__.py
@@ -42,6 +42,8 @@ See @{$python/contrib.rnn} guide.
 @@GridLSTMCell
 @@BidirectionalGridLSTMCell
 @@NASCell
+@@UGRNNCell
+@@IntersectionRNNCell
 @@PhasedLSTMCell
 @@HighwayWrapper
 
diff --git a/tensorflow/contrib/signal/BUILD b/tensorflow/contrib/signal/BUILD
new file mode 100644
index 0000000000..5b65a6ae05
--- /dev/null
+++ b/tensorflow/contrib/signal/BUILD
@@ -0,0 +1,46 @@
+package(default_visibility = ["//tensorflow:__subpackages__"])
+
+licenses(["notice"])  # Apache 2.0
+
+exports_files(["LICENSE"])
+
+load("//tensorflow:tensorflow.bzl", "cuda_py_tests")
+
+py_library(
+    name = "signal_py",
+    srcs = ["__init__.py"] + glob(["python/ops/*.py"]),
+    srcs_version = "PY2AND3",
+    deps = [
+        "//tensorflow/python:array_ops",
+        "//tensorflow/python:framework",
+        "//tensorflow/python:math_ops",
+    ],
+)
+
+cuda_py_tests(
+    name = "shape_ops_test",
+    size = "small",
+    srcs = ["python/kernel_tests/shape_ops_test.py"],
+    additional_deps = [
+        ":signal_py",
+        "//third_party/py/numpy",
+        "//tensorflow/python:array_ops",
+        "//tensorflow/python:client_testlib",
+        "//tensorflow/python:framework",
+        "//tensorflow/python:framework_for_generated_wrappers",
+        "//tensorflow/python:framework_test_lib",
+        "//tensorflow/python:platform_test",
+    ],
+)
+
+filegroup(
+    name = "all_files",
+    srcs = glob(
+        ["**/*"],
+        exclude = [
+            "**/METADATA",
+            "**/OWNERS",
+        ],
+    ),
+    visibility = ["//tensorflow:__subpackages__"],
+)
diff --git a/tensorflow/contrib/signal/__init__.py b/tensorflow/contrib/signal/__init__.py
new file mode 100644
index 0000000000..9f906dd28e
--- /dev/null
+++ b/tensorflow/contrib/signal/__init__.py
@@ -0,0 +1,27 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""##Signal ops.
+
+@@frames
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from tensorflow.contrib.signal.python.ops.shape_ops import frames
+
+from tensorflow.python.util.all_util import remove_undocumented
+remove_undocumented(__name__)
diff --git a/tensorflow/contrib/signal/python/__init__.py b/tensorflow/contrib/signal/python/__init__.py
new file mode 100644
index 0000000000..e672d1146c
--- /dev/null
+++ b/tensorflow/contrib/signal/python/__init__.py
@@ -0,0 +1,19 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Signal ops."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
diff --git a/tensorflow/contrib/signal/python/kernel_tests/shape_ops_test.py b/tensorflow/contrib/signal/python/kernel_tests/shape_ops_test.py
new file mode 100644
index 0000000000..e07942875f
--- /dev/null
+++ b/tensorflow/contrib/signal/python/kernel_tests/shape_ops_test.py
@@ -0,0 +1,68 @@
+# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for shape_ops."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+from tensorflow.contrib.signal.python.ops import shape_ops
+from tensorflow.python.framework import constant_op
+from tensorflow.python.framework import dtypes
+from tensorflow.python.ops import array_ops
+from tensorflow.python.platform import test
+
+
+class FramesTest(test.TestCase):
+
+  def test_mapping_of_indices_without_padding(self):
+    with self.test_session():
+      tensor = constant_op.constant(np.arange(9152), dtypes.int32)
+      tensor = array_ops.expand_dims(tensor, 0)
+
+      result = shape_ops.frames(tensor, 512, 180)
+      result = result.eval()
+
+      expected = np.tile(np.arange(512), (49, 1))
+      expected += np.tile(np.arange(49) * 180, (512, 1)).T
+
+      expected = np.expand_dims(expected, axis=0)
+      expected = np.array(expected, dtype=np.int32)
+
+      self.assertAllEqual(expected, result)
+
+  def test_mapping_of_indices_with_padding(self):
+    with self.test_session():
+      tensor = constant_op.constant(np.arange(10000), dtypes.int32)
+      tensor = array_ops.expand_dims(tensor, 0)
+
+      result = shape_ops.frames(tensor, 512, 192)
+      result = result.eval()
+
+      expected = np.tile(np.arange(512), (51, 1))
+      expected += np.tile(np.arange(51) * 192, (512, 1)).T
+
+      expected[expected >= 10000] = 0
+
+      expected = np.expand_dims(expected, axis=0)
+      expected = np.array(expected, dtype=np.int32)
+
+      self.assertAllEqual(expected, result)
+
+
+if __name__ == "__main__":
+  test.main()
diff --git a/tensorflow/contrib/signal/python/ops/__init__.py b/tensorflow/contrib/signal/python/ops/__init__.py
new file mode 100644
index 0000000000..e672d1146c
--- /dev/null
+++ b/tensorflow/contrib/signal/python/ops/__init__.py
@@ -0,0 +1,19 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Signal ops."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
diff --git a/tensorflow/contrib/signal/python/ops/shape_ops.py b/tensorflow/contrib/signal/python/ops/shape_ops.py
new file mode 100644
index 0000000000..4914f19be7
--- /dev/null
+++ b/tensorflow/contrib/signal/python/ops/shape_ops.py
@@ -0,0 +1,87 @@
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""General shape ops for frames."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from tensorflow.python.framework import dtypes
+from tensorflow.python.framework import ops
+
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import math_ops
+
+
+def frames(signal, frame_length, frame_step, name=None):
+  """Frame a signal into overlapping frames.
+
+  May be used in front of spectral functions.
+
+  For example:
+
+  ```python
+  pcm = tf.placeholder(tf.float32, [None, 9152])
+  frames = tf.contrib.signal.frames(pcm, 512, 180)
+  magspec = tf.abs(tf.spectral.rfft(frames, [512]))
+  image = tf.expand_dims(magspec, 3)
+  ```
+
+  Args:
+    signal: A `Tensor` of shape `[batch_size, signal_length]`.
+    frame_length: An `int32` or `int64` `Tensor`. The length of each frame.
+    frame_step: An `int32` or `int64` `Tensor`. The step between frames.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` of frames with shape `[batch_size, num_frames, frame_length]`.
+
+  Raises:
+    ValueError: if signal does not have rank 2.
+  """
+  with ops.name_scope(name, "frames", [signal, frame_length, frame_step]):
+    signal = ops.convert_to_tensor(signal, name="signal")
+    frame_length = ops.convert_to_tensor(frame_length, name="frame_length")
+    frame_step = ops.convert_to_tensor(frame_step, name="frame_step")
+
+    signal_rank = signal.shape.ndims
+
+    if signal_rank != 2:
+      raise ValueError("expected signal to have rank 2 but was " + signal_rank)
+
+    signal_length = array_ops.shape(signal)[1]
+
+    num_frames = math_ops.ceil((signal_length - frame_length) / frame_step)
+    num_frames = 1 + math_ops.cast(num_frames, dtypes.int32)
+
+    pad_length = (num_frames - 1) * frame_step + frame_length
+    pad_signal = array_ops.pad(signal, [[0, 0], [0,
+                                                 pad_length - signal_length]])
+
+    indices_frame = array_ops.expand_dims(math_ops.range(frame_length), 0)
+    indices_frames = array_ops.tile(indices_frame, [num_frames, 1])
+
+    indices_step = array_ops.expand_dims(
+        math_ops.range(num_frames) * frame_step, 1)
+    indices_steps = array_ops.tile(indices_step, [1, frame_length])
+
+    indices = indices_frames + indices_steps
+
+    # TODO(androbin): remove `transpose` when `gather` gets `axis` support
+    pad_signal = array_ops.transpose(pad_signal)
+    signal_frames = array_ops.gather(pad_signal, indices)
+    signal_frames = array_ops.transpose(signal_frames, perm=[2, 0, 1])
+
+    return signal_frames
diff --git a/tensorflow/contrib/slim/README.md b/tensorflow/contrib/slim/README.md
index c8842dd57b..61148c0b26 100644
--- a/tensorflow/contrib/slim/README.md
+++ b/tensorflow/contrib/slim/README.md
@@ -447,7 +447,7 @@ vgg = tf.contrib.slim.nets.vgg
 images, labels = ...
 
 # Create the model.
-predictions = vgg.vgg_16(images)
+predictions, _ = vgg.vgg_16(images)
 
 # Define the loss functions and get the total loss.
 loss = slim.losses.softmax_cross_entropy(predictions, labels)
diff --git a/tensorflow/contrib/slim/python/slim/data/README.md b/tensorflow/contrib/slim/python/slim/data/README.md
index 858c694990..fe15a10b99 100644
--- a/tensorflow/contrib/slim/python/slim/data/README.md
+++ b/tensorflow/contrib/slim/python/slim/data/README.md
@@ -71,27 +71,27 @@ for item in data_decoder.list_items():
   print(item)
 ```
 
-## Example: TFExampleDataDecoder
+## Example: TFExampleDecoder
 
 The
-[tfexample_data_decoder.py](https://www.tensorflow.org/code/tensorflow/contrib/slim/python/slim/data/tfexample_data_decoder.py)
+[tfexample_decoder.py](https://www.tensorflow.org/code/tensorflow/contrib/slim/python/slim/data/tfexample_decoder.py)
 is a data decoder which decodes serialized `TFExample` protocol buffers. A
 `TFExample` protocol buffer is a map from keys (strings) to either a
 `tf.FixedLenFeature` or `tf.VarLenFeature`. Consequently, to decode a
 `TFExample`, one must provide a mapping from one or more `TFExample` fields
-to each of the `items` that the `tfexample_data_decoder` can provide. For
+to each of the `items` that the `tfexample_decoder` can provide. For
 example, a dataset of `TFExamples` might store images in various formats and
 each `TFExample` might contain an `encoding` key and a `format` key which can
 be used to decode the image using the appropriate decoder (jpg, png, etc).
 
-To make this possible, the `tfexample_data_decoder` is constructed by specifying
+To make this possible, the `tfexample_decoder` is constructed by specifying
 the a map of `TFExample` keys to either `tf.FixedLenFeature` or
 `tf.VarLenFeature` as well as a set of `ItemHandlers`. An `ItemHandler`
 provides a mapping from `TFExample` keys to the item being provided. Because a
-`tfexample_data_decoder` might return multiple `items`, one often constructs a
-`tfexample_data_decoder` using multiple `ItemHandlers`.
+`tfexample_decoder` might return multiple `items`, one often constructs a
+`tfexample_decoder` using multiple `ItemHandlers`.
 
-`tfexample_data_decoder` provides some predefined `ItemHandlers` which take care
+`tfexample_decoder` provides some predefined `ItemHandlers` which take care
 of the common cases of mapping `TFExamples` to images, `Tensors` and
 `SparseTensors`. For example, the following specification might be
 used to decode a dataset of images:
diff --git a/tensorflow/contrib/slim/python/slim/nets/resnet_v2.py b/tensorflow/contrib/slim/python/slim/nets/resnet_v2.py
index f260ede348..bd811e3726 100644
--- a/tensorflow/contrib/slim/python/slim/nets/resnet_v2.py
+++ b/tensorflow/contrib/slim/python/slim/nets/resnet_v2.py
@@ -64,6 +64,8 @@ from tensorflow.python.ops import math_ops
 from tensorflow.python.ops import nn_ops
 from tensorflow.python.ops import variable_scope
 
+resnet_arg_scope = resnet_utils.resnet_arg_scope
+
 
 @add_arg_scope
 def bottleneck(inputs,
diff --git a/tensorflow/contrib/testing/python/framework/fake_summary_writer.py b/tensorflow/contrib/testing/python/framework/fake_summary_writer.py
index 02f2d7fae8..f2065c6662 100644
--- a/tensorflow/contrib/testing/python/framework/fake_summary_writer.py
+++ b/tensorflow/contrib/testing/python/framework/fake_summary_writer.py
@@ -127,3 +127,6 @@ class FakeSummaryWriter(object):
 
   def reopen(self):
     pass
+
+  def close(self):
+    pass
diff --git a/tensorflow/contrib/verbs/README.md b/tensorflow/contrib/verbs/README.md
index 37a543dda8..da5f2b0223 100644
--- a/tensorflow/contrib/verbs/README.md
+++ b/tensorflow/contrib/verbs/README.md
@@ -1,54 +1,54 @@
-## How to compile and use Rdma-enabled tensorflow
-1. Follow the regular TF compilation instructions. During configure step, if you want ibverbs based Rdma support, answer yes to this question:
+## How to compile and use RDMA-enabled TensorFlow
+1. Follow the regular TF compilation instructions. During configure step, if you want ibverbs based RDMA support, answer yes to this question:
 
     ```Do you wish to build TensorFlow with VERBS-RDMA support [y/N]```
 
-2. To turn on Rdma connection, add the protocol "grpc+verbs" in server definition:
+2. To turn on RDMA connection, add the protocol "grpc+verbs" in server definition:
 
     ```server = tf.train.Server(cluster, job_name="local", task_index=0, protocol='grpc+verbs') # default protocol is 'grpc'```
 
 ## Overview
-The design is based on Tensorflow r1.0. An Rdma path is added between servers for tensor transfer (weights, gradients, etc). The existing GRPC path remains and is responsible for "administrative" tasks, such as setting up the Rdma path, exchanging computation graphs, etc.
+The design is based on TensorFlow r1.0. An RDMA path is added between servers for tensor transfer (weights, gradients, etc). The existing GRPC path remains and is responsible for "administrative" tasks, such as setting up the RDMA path, exchanging computation graphs, etc.
 
-During the server setup, an Rdma manager is created to manage low-level Rdma components such as Rdma channel and Rdma adapter, an Rdma rendezvous manager is created to oversee send/recv operations between servers. Following the distributed Tensorflow design philosophy, the send operation is passive, i.e. merely placing a tensor in the local out-going table. It is the receive operation that actually initiates the tensor transfer.
+During the server setup, an RDMA manager is created to manage low-level RDMA components such as RDMA channel and RDMA adapter, an RDMA rendezvous manager is created to oversee send/recv operations between servers. Following the distributed TensorFlow design philosophy, the send operation is passive, i.e. merely placing a tensor in the local out-going table. It is the receive operation that actually initiates the tensor transfer.
 
-Tensorflow dynamically allocates memory for tensors that are to be sent or received. This causes difficulty for Rdma operations where pinned memory is required. Two remedies are possible, either the memory is pinned, transfer, then unpinned for each and every tensor to be transferred, or a buffer is pre-allocated and pinned for each tensor. The former incurs significant operation overhead since pinning and unpinning memory for each dynamically generated tensor is slow. The latter incurs large memory overhead and extra copying from the tensor to its pinned buffer, but may still be faster than the former. The second approach is adopted in this design. Each Rdma channel, representing a Rdma connection to a peer, contains a table of pinned buffers for all the seen tensors that requires transfer. It is assumed that the tensor size rarely changes across different steps. So only one buffer is created for the same tensor across all the steps. In the rare case when the tensor size does increases, the old buffer is discarded and new buffer of larger size is created and pinned.
+TensorFlow dynamically allocates memory for tensors that are to be sent or received. This causes difficulty for RDMA operations where pinned memory is required. Two remedies are possible, either the memory is pinned, transfer, then unpinned for each and every tensor to be transferred, or a buffer is pre-allocated and pinned for each tensor. The former incurs significant operation overhead since pinning and unpinning memory for each dynamically generated tensor is slow. The latter incurs large memory overhead and extra copying from the tensor to its pinned buffer, but may still be faster than the former. The second approach is adopted in this design. Each RDMA channel, representing a RDMA connection to a peer, contains a table of pinned buffers for all the seen tensors that requires transfer. It is assumed that the tensor size rarely changes across different steps. So only one buffer is created for the same tensor across all the steps. In the rare case when the tensor size does increases, the old buffer is discarded and new buffer of larger size is created and pinned.
 
-When a tensor is prepared fro transfer, it is first converted to TensorProto, then the proto is serialized to byte array and copied to the pinned buffer. The content of the buffer is transferred to the remote node via Rdma write. On the remote side, the process is reversed. This is illustrated in the diagram below. The conversion of TensorProto is introduced to simplify transfer of string-tensors. Also since the TensorProto lives in host memory, even if the origin tensor lives in the device, the pinned buffers are all allocated in the host memory.
-![Tensorflow Rdma path](./design_diagram.png)
+When a tensor is prepared for transfer, it is first converted to TensorProto, then the proto is serialized to byte array and copied to the pinned buffer. The content of the buffer is transferred to the remote node via RDMA write. On the remote side, the process is reversed. This is illustrated in the diagram below. The conversion of TensorProto is introduced to simplify transfer of string-tensors. Also since the TensorProto lives in host memory, even if the origin tensor lives in the device, the pinned buffers are all allocated in the host memory.
+![TensorFlow RDMA path](./design_diagram.png)
 
 The following improvements can be made in the future. First, conversion to TensorProto and serialization can be avoided for numeric (float/int) tensors since their internal buffer can be access directly as byte array. Second, the pinned buffer may be allocated on device if the tensor is located in the device. This avoids extra device-to-host copy at the expense of extra device memory consumption.
 ## Design details
 
-### Rdma components
+### RDMA components
 
-* **Rdma adapter:** The base for Rdma communications. It may contain multiple channels and buffers.  It is responsible for handling various incoming Rdma messages.
-* **Rdma channel:** Responsible for Rdma connection to a particular node. It manages multiple buffers. A channel has a callback table which stores all the callbacks for the requested tensors.
-* **Rdma buffer:** Responsible for sending or receiving data. It has a fixed size memory to store the data. It has a queue to store the pending jobs. There are three types of buffers, message buffer, ACK buffer and tensor buffer. A channel has two message buffers, two ack buffers and many tensor buffers.
-* **Rdma manager:** Manages the adapter and channels, including channel creation, channel setup via GRPC service, channel lookup, etc.
-* **Rdma rendezvous manager:** manages multiple rdma rendezvous. 
-* **Rdma rendezvous:** a derived class of BaseRemoteRendezvous. This class is the back end for "send" and "recv" ops. When the sendrecv_op wants to send or receive a tensor, it calls the rendezvous' "send" and "recv" functions respectively. Rendezvous are identified by "step_id", a random number, so that tensors for different iterations don't get mixed up.
+* **RDMA adapter:** The base for RDMA communications. It may contain multiple channels and buffers.  It is responsible for handling various incoming RDMA messages.
+* **RDMA channel:** Responsible for RDMA connection to a particular node. It manages multiple buffers. A channel has a callback table which stores all the callbacks for the requested tensors.
+* **RDMA buffer:** Responsible for sending or receiving data. It has a fixed size memory to store the data. It has a queue to store the pending jobs. There are three types of buffers, message buffer, ACK buffer and tensor buffer. A channel has two message buffers, two ack buffers and many tensor buffers.
+* **RDMA manager:** Manages the adapter and channels, including channel creation, channel setup via GRPC service, channel lookup, etc.
+* **RDMA rendezvous manager:** manages multiple rdma rendezvous. 
+* **RDMA rendezvous:** a derived class of BaseRemoteRendezvous. This class is the back end for "send" and "recv" ops. When the sendrecv_op wants to send or receive a tensor, it calls the rendezvous' "send" and "recv" functions respectively. Rendezvous are identified by "step_id", a random number, so that tensors for different iterations don't get mixed up.
 
 ### The SEND operation
 
-In tensorflow, when rendezvous sends a tensor, it merely puts a tensor in a local table in the corresponding rendezvous. If the tensor has been requested, a callback exists in the table. "send" will activate the callback, which tries to send the tensor across the node.
+In TensorFlow, when rendezvous sends a tensor, it merely puts a tensor in a local table in the corresponding rendezvous. If the tensor has been requested, a callback exists in the table. "send" will activate the callback, which tries to send the tensor across the node.
 
 
 ### The RECV operation
 
-When a tensor is requested, rendezvous' recv function is called. The function first places a callback in the channel's callback table, which will be activated once the tensor is sent from the source. In the next step, a message is sent to notify the source of the requested tensor. Once the source receives the message, it will check locally for the tensor, if not found, a callback is placed in the table, otherwise, the tensor id will be placed at corresponding Rdma buffer's job queue for future transmission. When a tensor is scheduled to be transmitted, the Rdma buffer needs to have the memory allocated and initialized (registered with the remote buffer info). If the memory is not ready, the transmission is deferred, a message is sent to the destination to establish the memory first. The other case a transimssion can be deferred is when the buffer is still being used by an on-going transmission.
+When a tensor is requested, rendezvous' recv function is called. The function first places a callback in the channel's callback table, which will be activated once the tensor is sent from the source. In the next step, a message is sent to notify the source of the requested tensor. Once the source receives the message, it will check locally for the tensor, if not found, a callback is placed in the table, otherwise, the tensor id will be placed at corresponding RDMA buffer's job queue for future transmission. When a tensor is scheduled to be transmitted, the RDMA buffer needs to have the memory allocated and initialized (registered with the remote buffer info). If the memory is not ready, the transmission is deferred, a message is sent to the destination to establish the memory first. The other case a transmission can be deferred is when the buffer is still being used by an on-going transmission.
 
-### Three types of Rdma buffers
+### Three types of RDMA buffers
 
 * **Message buffer:** responsible for sending message only.
 * **Ack buffer:** once a message is sent, the recipient needs to send an ack via the ack buffer to free up the message buffer. An ack buffer is exclusively for its coupled message buffer.
 * **Tensor buffer:** responsible for sending tensors. The recipient needs to send back a message to free up the sending buffer.
 
-### Rdma packet format
+### RDMA packet format
 
 |type|name_size|name|step_id|buffer_size|remote_addr|rkey|is_dead|data_type|tensor_shape|tensor_bytes|tensor_buffer|
 
-### Six types of Rdma messages
+### Six types of RDMA messages
 * RDMA_MESSAGE_ACK
 * RDMA_MESSAGE_BUFFER_IDLE
 * RDMA_MESSAGE_BUFFER_REQUEST
@@ -56,7 +56,7 @@ When a tensor is requested, rendezvous' recv function is called. The function fi
 * RDMA_MESSAGE_TENSOR_REQUEST
 * RDMA_MESSAGE_TENSOR_WRITE
 
-### Actions upon receiving Rdma messages
+### Actions upon receiving RDMA messages
 * RDMA_MESSAGE_ACK
   * sender: mark local ack buffer idle.
   * receiver: mark remote message buffer idle, send next item.
diff --git a/tensorflow/contrib/verbs/grpc_verbs_service.cc b/tensorflow/contrib/verbs/grpc_verbs_service.cc
index e73b2700bd..f2af6b79fb 100644
--- a/tensorflow/contrib/verbs/grpc_verbs_service.cc
+++ b/tensorflow/contrib/verbs/grpc_verbs_service.cc
@@ -117,6 +117,8 @@ Status GrpcVerbsService::GetRemoteAddressSync(
   ra.lid = request->channel().lid();
   ra.qpn = request->channel().qpn();
   ra.psn = request->channel().psn();
+  ra.snp = request->channel().snp();
+  ra.iid = request->channel().iid();
   rc->SetRemoteAddress(ra, false);
   rc->Connect();
   int i = 0;
@@ -146,6 +148,8 @@ Status GrpcVerbsService::GetRemoteAddressSync(
   channel_info->set_lid(rc->self().lid);
   channel_info->set_qpn(rc->self().qpn);
   channel_info->set_psn(rc->self().psn);
+  channel_info->set_snp(rc->self().snp);
+  channel_info->set_iid(rc->self().iid);
   for (int i = 0; i < RdmaChannel::kNumMessageBuffers; i++) {
     MemoryRegion* mr = response->add_mr();
     mr->set_remote_addr(reinterpret_cast<uint64>(mb[i]->buffer()));
diff --git a/tensorflow/contrib/verbs/rdma.cc b/tensorflow/contrib/verbs/rdma.cc
index 53d840f5d1..05df05de35 100644
--- a/tensorflow/contrib/verbs/rdma.cc
+++ b/tensorflow/contrib/verbs/rdma.cc
@@ -271,6 +271,11 @@ RdmaChannel::RdmaChannel(const RdmaAdapter* adapter, const string local_name,
     self_.lid = attr.lid;
     self_.qpn = qp_->qp_num;
     self_.psn = static_cast<uint32_t>(random::New64()) & 0xffffff;
+    union ibv_gid gid;
+    CHECK(!ibv_query_gid(adapter_->context_, (uint8_t)1, 0, &gid))
+        << "Query gid";
+    self_.snp = gid.global.subnet_prefix;
+    self_.iid = gid.global.interface_id;
   }
 
   // create message and ack buffers, then initialize the tables.
@@ -320,11 +325,15 @@ void RdmaChannel::SetRemoteAddress(const RdmaAddress& ra, bool override) {
     remote_.lid = ra.lid;
     remote_.qpn = ra.qpn;
     remote_.psn = ra.psn;
+    remote_.snp = ra.snp;
+    remote_.iid = ra.iid;
     remote_set_ = true;
   } else {
     CHECK(remote_.lid == ra.lid);
     CHECK(remote_.qpn == ra.qpn);
     CHECK(remote_.psn == ra.psn);
+    CHECK(remote_.snp == ra.snp);
+    CHECK(remote_.iid == ra.iid);
   }
 }
 
@@ -472,7 +481,11 @@ void RdmaChannel::Connect(const RdmaAddress& remoteAddr) {
     attr.rq_psn = remoteAddr.psn;
     attr.max_dest_rd_atomic = 1;
     attr.min_rnr_timer = 12;
-    attr.ah_attr.is_global = 0;
+    attr.ah_attr.is_global = 1;
+    attr.ah_attr.grh.dgid.global.subnet_prefix = remoteAddr.snp;
+    attr.ah_attr.grh.dgid.global.interface_id = remoteAddr.iid;
+    attr.ah_attr.grh.flow_label = 0;
+    attr.ah_attr.grh.hop_limit = 255;
     attr.ah_attr.dlid = remoteAddr.lid;
     attr.ah_attr.sl = 0;
     attr.ah_attr.src_path_bits = 0;
diff --git a/tensorflow/contrib/verbs/rdma.h b/tensorflow/contrib/verbs/rdma.h
index ae2aa63e3f..10cbbe58d9 100644
--- a/tensorflow/contrib/verbs/rdma.h
+++ b/tensorflow/contrib/verbs/rdma.h
@@ -40,6 +40,8 @@ struct RdmaAddress {
   uint32_t lid;
   uint32_t qpn;
   uint32_t psn;
+  uint64_t snp;
+  uint64_t iid;
 };
 // structure to save information for remote memory regions.
 struct RemoteMR {
diff --git a/tensorflow/contrib/verbs/rdma_mgr.cc b/tensorflow/contrib/verbs/rdma_mgr.cc
index e28b80c6f6..09b878843f 100644
--- a/tensorflow/contrib/verbs/rdma_mgr.cc
+++ b/tensorflow/contrib/verbs/rdma_mgr.cc
@@ -69,6 +69,8 @@ void RdmaMgr::SetupChannels() {
     channel_info->set_lid(rc->self_.lid);
     channel_info->set_qpn(rc->self_.qpn);
     channel_info->set_psn(rc->self_.psn);
+    channel_info->set_snp(rc->self_.snp);
+    channel_info->set_iid(rc->self_.iid);
     for (int i = 0; i < RdmaChannel::kNumMessageBuffers; i++) {
       MemoryRegion* mr = req.add_mr();
       mr->set_remote_addr(
@@ -85,6 +87,8 @@ void RdmaMgr::SetupChannels() {
       ra.lid = resp.channel().lid();
       ra.qpn = resp.channel().qpn();
       ra.psn = resp.channel().psn();
+      ra.snp = resp.channel().snp();
+      ra.iid = resp.channel().iid();
       rc->SetRemoteAddress(ra, false);
       rc->Connect();
       int i = 0;
diff --git a/tensorflow/contrib/verbs/verbs_service.proto b/tensorflow/contrib/verbs/verbs_service.proto
index b985febfb8..0df1fed4b9 100644
--- a/tensorflow/contrib/verbs/verbs_service.proto
+++ b/tensorflow/contrib/verbs/verbs_service.proto
@@ -30,6 +30,8 @@ message Channel {
   int32 lid = 1;
   int32 qpn = 2;
   int32 psn = 3;
+  uint64 snp = 4;
+  uint64 iid = 5;
 }
 
 message MemoryRegion {
diff --git a/tensorflow/core/distributed_runtime/rpc/grpc_call.h b/tensorflow/core/distributed_runtime/rpc/grpc_call.h
index 35f849c7a5..3b45e7e8a7 100644
--- a/tensorflow/core/distributed_runtime/rpc/grpc_call.h
+++ b/tensorflow/core/distributed_runtime/rpc/grpc_call.h
@@ -16,6 +16,7 @@ limitations under the License.
 #ifndef THIRD_PARTY_TENSORFLOW_CORE_DISTRIBUTED_RUNTIME_RPC_GRPC_CALL_H_
 #define THIRD_PARTY_TENSORFLOW_CORE_DISTRIBUTED_RUNTIME_RPC_GRPC_CALL_H_
 
+#include "tensorflow/core/lib/core/refcount.h"
 #include "tensorflow/core/platform/macros.h"
 
 #include "grpc++/grpc++.h"
diff --git a/tensorflow/core/distributed_runtime/rpc/grpc_tensor_coding.cc b/tensorflow/core/distributed_runtime/rpc/grpc_tensor_coding.cc
index c6260afa20..ba206890ce 100644
--- a/tensorflow/core/distributed_runtime/rpc/grpc_tensor_coding.cc
+++ b/tensorflow/core/distributed_runtime/rpc/grpc_tensor_coding.cc
@@ -16,6 +16,7 @@ limitations under the License.
 #include "tensorflow/core/distributed_runtime/rpc/grpc_tensor_coding.h"
 #include "grpc++/support/byte_buffer.h"
 #include "grpc++/support/slice.h"
+#include "tensorflow/core/common_runtime/dma_helper.h"
 #include "tensorflow/core/framework/tensor.h"
 #include "tensorflow/core/framework/tensor_reference.h"
 #include "tensorflow/core/lib/gtl/inlined_vector.h"
@@ -27,10 +28,9 @@ namespace tensorflow {
 namespace grpc {
 
 static void do_nothing(void* raw) {}
-static void unref_tensorreference(void* raw) {
-  TensorReference* ref = static_cast<TensorReference*>(raw);
-  ref->Unref();
-  delete ref;
+static void unref_tensorbuffer(void* raw) {
+  TensorBuffer* buf = static_cast<TensorBuffer*>(raw);
+  buf->Unref();
 }
 
 void EncodeRecvTensorResponseToByteBuffer(const RecvTensorResponse& proto,
@@ -219,8 +219,8 @@ void EncodeTensorToByteBuffer(bool is_dead, const Tensor& val,
 
     if (tensor_data_is_large) {
       // Encode the actual tensor data by pointing to the backing store,
-      // and add a special zero-length slice that is really a TensorReference
-      // object that we will destroy when we are done.
+      // and add a special zero-length slice that is really a TensorBuffer
+      // reference that we will unref when we are done.
       //
       // TODO(jeff): Note that this approach relies on the fact that
       // slices are destroyed in the order in which they are added to
@@ -241,17 +241,15 @@ void EncodeTensorToByteBuffer(bool is_dead, const Tensor& val,
 
       // (E) Encode tensor data, but by sharing backing store
 
-      // TODO(jeff,sanjay): It'd be nice to avoid this TensorReference
-      // allocation, and instead get our hands on the underlying
-      // TensorBuffer object and just directly ref it here and unref
-      // it in unref_tensorreference.
-      TensorReference* ref = new TensorReference(val);
+      const TensorBuffer* buf = DMAHelper::buffer(&val);
+      buf->Ref();
       gpr_slice s1 = gpr_slice_new(
           const_cast<void*>(static_cast<const void*>(tdata.data())),
           tdata.size(), do_nothing);
       slices[1] = ::grpc::Slice(s1, ::grpc::Slice::STEAL_REF);
 
-      gpr_slice s2 = gpr_slice_new(ref, 0, unref_tensorreference);
+      gpr_slice s2 =
+          gpr_slice_new(const_cast<TensorBuffer*>(buf), 0, unref_tensorbuffer);
       slices[2] = ::grpc::Slice(s2, ::grpc::Slice::STEAL_REF);
       num_slices += 2;
     }
diff --git a/tensorflow/core/kernels/cwise_op_atan2.cc b/tensorflow/core/kernels/cwise_op_atan2.cc
new file mode 100644
index 0000000000..68f67c444e
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_atan2.cc
@@ -0,0 +1,23 @@
+/* Copyright 2015 The TensorFlow Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER2(BinaryOp, CPU, "Atan2", functor::atan2, float, double);
+#if GOOGLE_CUDA
+REGISTER2(BinaryOp, GPU, "Atan2", functor::atan2, float, double);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_gpu_atan2.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_atan2.cu.cc
new file mode 100644
index 0000000000..137e14ef84
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_atan2.cu.cc
@@ -0,0 +1,26 @@
+/* Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_BINARY2(atan2, float, double);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_ops.h b/tensorflow/core/kernels/cwise_ops.h
index fb4c649a82..423307fd4c 100644
--- a/tensorflow/core/kernels/cwise_ops.h
+++ b/tensorflow/core/kernels/cwise_ops.h
@@ -658,6 +658,22 @@ struct zeta : base<T, Eigen::internal::scalar_zeta_op<T>> {};
 template <typename T>
 struct polygamma : base<T, Eigen::internal::scalar_polygamma_op<T>> {};
 
+template <typename Scalar>
+struct scalar_atan2_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_atan2_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar
+  operator()(const Scalar& y, const Scalar& x) const {
+#if GOOGLE_CUDA
+    return ::atan2(y, x);
+#else
+    return std::atan2(y, x);
+#endif
+  }
+};
+
+template <typename T>
+struct atan2 : base<T, scalar_atan2_op<T>> {};
+
 template <typename T>
 struct squared_difference
     : base<T, Eigen::internal::scalar_compose_op<
diff --git a/tensorflow/core/kernels/linalg_ops_common.cc b/tensorflow/core/kernels/linalg_ops_common.cc
index a38ddf76eb..155d9d1084 100644
--- a/tensorflow/core/kernels/linalg_ops_common.cc
+++ b/tensorflow/core/kernels/linalg_ops_common.cc
@@ -15,6 +15,8 @@ limitations under the License.
 
 #include "tensorflow/core/kernels/linalg_ops_common.h"
 
+#include <utility>
+
 #include "third_party/eigen3/Eigen/Core"
 #include "tensorflow/core/framework/device_base.h"
 #include "tensorflow/core/framework/kernel_def_builder.h"
@@ -153,10 +155,9 @@ void LinearAlgebraOp<Scalar>::AnalyzeInputs(OpKernelContext* context,
     const int col_dimension = input_rank - 1;
     const int64 num_rows = in.dim_size(row_dimension);
     const int64 num_cols = in.dim_size(col_dimension);
-    // TODO(rmlarsen): Use emplace_back when it is added to InlinedVector. Same
-    // in several places below.
-    input_matrix_shapes->push_back(TensorShape({num_rows, num_cols}));
-    inputs->push_back(&in);
+    input_matrix_shapes->emplace_back(
+        std::initializer_list<int64>({num_rows, num_cols}));
+    inputs->emplace_back(&in);
   }
   // Have the derived class validate that the inputs are as expected.
   ValidateInputMatrixShapes(context, *input_matrix_shapes);
@@ -198,9 +199,7 @@ void LinearAlgebraOp<Scalar>::PrepareOutputs(
       // concatenated with the output_matrix_shape (if the output is not
       // scalar).
       output_tensor_shape = batch_shape;
-      for (int dim = 0; dim < output_matrix_shape.dims(); ++dim) {
-        output_tensor_shape.AddDim(output_matrix_shape.dim_size(dim));
-      }
+      output_tensor_shape.AppendShape(output_matrix_shape);
     }
     Tensor* out = nullptr;
     // See if there is an input buffer we can reuse for this output.
@@ -219,7 +218,7 @@ void LinearAlgebraOp<Scalar>::PrepareOutputs(
       OP_REQUIRES_OK(context, context->allocate_output(
                                   output_idx, output_tensor_shape, &out));
     }
-    outputs->push_back(out);
+    outputs->emplace_back(out);
   }
 }
 
@@ -232,11 +231,10 @@ void LinearAlgebraOp<Scalar>::ComputeTensorSlice(
   for (size_t i = 0; i < inputs.size(); ++i) {
     // TODO(kalakris): Handle alignment if possible. Eigen::Map is
     // unaligned by default.
-    matrix_inputs.push_back(
-        ConstMatrixMap(inputs[i]->flat<Scalar>().data() +
-                           matrix_index * input_matrix_shapes[i].num_elements(),
-                       input_matrix_shapes[i].dim_size(0),
-                       input_matrix_shapes[i].dim_size(1)));
+    matrix_inputs.emplace_back(
+        inputs[i]->flat<Scalar>().data() +
+            matrix_index * input_matrix_shapes[i].num_elements(),
+        input_matrix_shapes[i].dim_size(0), input_matrix_shapes[i].dim_size(1));
   }
 
   MatrixMaps matrix_outputs;
@@ -248,10 +246,10 @@ void LinearAlgebraOp<Scalar>::ComputeTensorSlice(
     int num_output_cols = output_matrix_shapes[i].dims() == 2
                               ? output_matrix_shapes[i].dim_size(1)
                               : 1;
-    matrix_outputs.push_back(
-        MatrixMap(outputs[i]->flat<Scalar>().data() +
-                      matrix_index * output_matrix_shapes[i].num_elements(),
-                  num_output_rows, num_output_cols));
+    matrix_outputs.emplace_back(
+        outputs[i]->flat<Scalar>().data() +
+            matrix_index * output_matrix_shapes[i].num_elements(),
+        num_output_rows, num_output_cols);
   }
   ComputeMatrix(context, matrix_inputs, &matrix_outputs);
 }
diff --git a/tensorflow/core/kernels/linalg_ops_common.h b/tensorflow/core/kernels/linalg_ops_common.h
index fcb735bfdb..75d1d29793 100644
--- a/tensorflow/core/kernels/linalg_ops_common.h
+++ b/tensorflow/core/kernels/linalg_ops_common.h
@@ -21,10 +21,7 @@ limitations under the License.
 // computations across different threads if necessary.
 #include <algorithm>
 
-#define EIGEN_USE_THREADS
-
 #include "third_party/eigen3/Eigen/Core"
-#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
 #include "tensorflow/core/framework/kernel_def_builder.h"
 #include "tensorflow/core/framework/op_kernel.h"
 #include "tensorflow/core/framework/tensor.h"
diff --git a/tensorflow/core/ops/math_ops.cc b/tensorflow/core/ops/math_ops.cc
index 494358fec6..01b35064b2 100644
--- a/tensorflow/core/ops/math_ops.cc
+++ b/tensorflow/core/ops/math_ops.cc
@@ -731,6 +731,21 @@ The polygamma function is defined as:
 where \\(\psi(x)\\) is the digamma function.
 )doc");
 
+REGISTER_OP("Atan2")
+    .Input("y: T")
+    .Input("x: T")
+    .Output("z: T")
+    .Attr("T: {float, double}")
+    .SetShapeFn(shape_inference::BroadcastBinaryOpShapeFn)
+    .Doc(R"doc(
+Computes arctangent of `y/x` element-wise, respecting signs of the arguments.
+This is the angle \( \theta \in [-\pi, \pi] \) such that
+\[ x = r \cos(\theta) \]
+and
+\[ y = r \sin(\theta) \]
+where \(r = \sqrt(x^2 + y^2) \).
+)doc");
+
 REGISTER_OP("Betainc")
     .Input("a: T")
     .Input("b: T")
diff --git a/tensorflow/docs_src/community/documentation.md b/tensorflow/docs_src/community/documentation.md
index be18d3e8be..27b8c8d302 100644
--- a/tensorflow/docs_src/community/documentation.md
+++ b/tensorflow/docs_src/community/documentation.md
@@ -4,12 +4,24 @@ We welcome contributions to the Tensorflow documentation from the community.
 This document explains how you can contribute to that documentation. In
 particular, this document explains the following:
 
-- Where the documentation is located.
-- How to make conformant edits.
-- How to build and test your documentation changes before you submit them.
-
-You can view Tensorflow documentation on tensorflow.org, and you can view and
-edit the raw files on Github.
+* Where the documentation is located.
+* How to make conformant edits.
+* How to build and test your documentation changes before you submit them.
+
+You can view Tensorflow documentation on https://www.tensorflow.org, and you
+can view and edit the raw files on Github. We're publishing our docs on Github
+so everybody can contribute. Whatever gets checked in tensorflow/docs_src will
+be published soon after on https://www.tensorflow.org. 
+
+Republishing TensorFlow documentation in different forms is absolutely allowed,
+but we are unlikely to accept other documentation formats (or the tooling to
+generate them) into our repository. If you do choose to republish our
+documentation in another form, please be sure to include:
+
+* The version of the API this represents (i.e. r1.0, master, etc.)
+* The commit or version from which the documentation was generated
+* Where to get the latest documentation (that is, https://www.tensorflow.org)
+* The Apache 2.0 license.
 
 ## A Note on Versions
 
diff --git a/tensorflow/docs_src/get_started/input_fn.md b/tensorflow/docs_src/get_started/input_fn.md
index 74ed5fbebf..a053617b58 100644
--- a/tensorflow/docs_src/get_started/input_fn.md
+++ b/tensorflow/docs_src/get_started/input_fn.md
@@ -12,7 +12,7 @@ When training a neural network using tf.contrib.learn, it's possible to pass
 your feature and target data directly into your `fit`, `evaluate`, or `predict`
 operations. Here's an example taken from the @{$tflearn$tf.contrib.learn quickstart tutorial}:
 
-```py
+```python
 training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
     filename=IRIS_TRAINING, target_dtype=np.int, features_dtype=np.float32)
 test_set = tf.contrib.learn.datasets.base.load_csv_with_header(
diff --git a/tensorflow/docs_src/install/install_linux.md b/tensorflow/docs_src/install/install_linux.md
index 8ee31fe692..e2f7ae6bf9 100644
--- a/tensorflow/docs_src/install/install_linux.md
+++ b/tensorflow/docs_src/install/install_linux.md
@@ -166,7 +166,7 @@ Take the following steps to install TensorFlow with Virtualenv:
      virtualenv environment:
 
      <pre>(tensorflow)$ <b>pip3 install --upgrade \
-     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0rc2-cp34-cp34m-linux_x86_64.whl</b></pre>
+     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp34-cp34m-linux_x86_64.whl</b></pre>
 
 If you encounter installation problems, see
 [Common Installation Problems](#common_installation_problems).
@@ -271,7 +271,7 @@ take the following steps:
 
      <pre>
      $ <b>sudo pip3 install --upgrade \
-     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0rc2-cp34-cp34m-linux_x86_64.whl</b>
+     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp34-cp34m-linux_x86_64.whl</b>
      </pre>
 
      If this step fails, see
@@ -458,7 +458,7 @@ Take the following steps to install TensorFlow in an Anaconda environment:
 
      <pre>
      (tensorflow)$ <b>pip install --ignore-installed --upgrade \
-     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0rc2-cp34-cp34m-linux_x86_64.whl</b></pre>
+     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp34-cp34m-linux_x86_64.whl</b></pre>
 
 
 <a name="ValidateYourInstallation"></a>
@@ -626,14 +626,14 @@ This section documents the relevant values for Linux installations.
 CPU only:
 
 <pre>
-https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0rc2-cp27-none-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp27-none-linux_x86_64.whl
 </pre>
 
 
 GPU support:
 
 <pre>
-https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0rc2-cp27-none-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp27-none-linux_x86_64.whl
 </pre>
 
 Note that GPU support requires the NVIDIA hardware and software described in
@@ -645,14 +645,14 @@ Note that GPU support requires the NVIDIA hardware and software described in
 CPU only:
 
 <pre>
-https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0rc2-cp34-cp34m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp34-cp34m-linux_x86_64.whl
 </pre>
 
 
 GPU support:
 
 <pre>
-https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0rc2-cp34-cp34m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp34-cp34m-linux_x86_64.whl
 </pre>
 
 Note that GPU support requires the NVIDIA hardware and software described in
@@ -664,14 +664,14 @@ Note that GPU support requires the NVIDIA hardware and software described in
 CPU only:
 
 <pre>
-https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0rc2-cp35-cp35m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp35-cp35m-linux_x86_64.whl
 </pre>
 
 
 GPU support:
 
 <pre>
-https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0rc2-cp35-cp35m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-linux_x86_64.whl
 </pre>
 
 
@@ -683,14 +683,14 @@ Note that GPU support requires the NVIDIA hardware and software described in
 CPU only:
 
 <pre>
-https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0rc2-cp36-cp36m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp36-cp36m-linux_x86_64.whl
 </pre>
 
 
 GPU support:
 
 <pre>
-https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0rc2-cp36-cp36m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp36-cp36m-linux_x86_64.whl
 </pre>
 
 
diff --git a/tensorflow/docs_src/install/install_mac.md b/tensorflow/docs_src/install/install_mac.md
index b6a5758afe..73e143b145 100644
--- a/tensorflow/docs_src/install/install_mac.md
+++ b/tensorflow/docs_src/install/install_mac.md
@@ -112,7 +112,7 @@ Take the following steps to install TensorFlow with Virtualenv:
      https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.1.0rc2-py2-none-any.whl</b></pre>
 
 If you encounter installation problems, see
-[Common Installation Problems](#CommonInstallationProblems).
+[Common Installation Problems](#common-installation-problems).
 
 
 ### Next Steps
@@ -233,7 +233,7 @@ take the following steps:
      https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.1.0rc2-py2-none-any.whl</b> </pre>
 
      If the preceding command fails, see
-     [Common installation problems](#CommonInstallationProblems).
+     [installation problems](#common-installation-problems).
 
 
 
diff --git a/tensorflow/docs_src/install/install_sources.md b/tensorflow/docs_src/install/install_sources.md
index 88268ba62f..8dd7870faa 100644
--- a/tensorflow/docs_src/install/install_sources.md
+++ b/tensorflow/docs_src/install/install_sources.md
@@ -320,10 +320,10 @@ Invoke `pip install` to install that pip package.
 The filename of the `.whl` file depends on your platform.
 For example, the following command will install the pip package
 
-for TensorFlow 1.1.0rc2 on Linux:
+for TensorFlow 1.1.0 on Linux:
 
 <pre>
-$ <b>sudo pip install /tmp/tensorflow_pkg/tensorflow-1.1.0rc2-py2-none-any.whl</b>
+$ <b>sudo pip install /tmp/tensorflow_pkg/tensorflow-1.1.0-py2-none-any.whl</b>
 </pre>
 
 ## Validate your installation
diff --git a/tensorflow/docs_src/install/install_windows.md b/tensorflow/docs_src/install/install_windows.md
index 5f7c27c028..db7c661aa1 100644
--- a/tensorflow/docs_src/install/install_windows.md
+++ b/tensorflow/docs_src/install/install_windows.md
@@ -103,7 +103,7 @@ Take the following steps to install TensorFlow in an Anaconda environment:
   2. Create a conda environment named <tt>tensorflow</tt>
      by invoking the following command:
 
-     <pre>C:\> <b>conda create -n tensorflow</b> </pre>
+     <pre>C:\> <b>conda create -n tensorflow python=3.5</b> </pre>
 
   3. Activate the conda environment by issuing the following command:
 
@@ -114,12 +114,12 @@ Take the following steps to install TensorFlow in an Anaconda environment:
      environment. To install the CPU-only version of TensorFlow, enter the
      following command:
 
-     <pre>(tensorflow)C:\> <b>pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/cpu/tensorflow-1.1.0rc2-cp35-cp35m-win_amd64.whl</b> </pre>
+     <pre>(tensorflow)C:\> <b>pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/cpu/tensorflow-1.1.0-cp35-cp35m-win_amd64.whl</b> </pre>
 
      To install the GPU version of TensorFlow, enter the following command
      (on a single line):
 
-     <pre>(tensorflow)C:\> <b>pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-1.1.0rc2-cp35-cp35m-win_amd64.whl</b> </pre>
+     <pre>(tensorflow)C:\> <b>pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-win_amd64.whl</b> </pre>
 
 ## Validate your installation
 
@@ -193,5 +193,20 @@ ImportError: cannot import name 'descriptor'</pre>
   <td><pre>No module named "pywrap_tensorflow"</pre></td>
 </tr>
 
+<tr>
+  <td><a href="https://stackoverflow.com/q/42217532">42217532</a></td>
+  <td>
+  <pre>OpKernel ('op: "BestSplits" device_type: "CPU"') for unknown op: BestSplits</pre>
+  </td>
+</tr>
+
+<tr>
+  <td><a href="https://stackoverflow.com/q/43134753">43134753</a></td>
+  <td>
+  <pre>The TensorFlow library wasn't compiled to use SSE instructions</pre>
+  </td>
+</tr>
+
+
 </table>
 
diff --git a/tensorflow/examples/image_retraining/retrain.py b/tensorflow/examples/image_retraining/retrain.py
index 5f4b6bed48..6c1b40b442 100644
--- a/tensorflow/examples/image_retraining/retrain.py
+++ b/tensorflow/examples/image_retraining/retrain.py
@@ -369,9 +369,12 @@ def create_bottleneck_file(bottleneck_path, image_lists, label_name, index,
   if not gfile.Exists(image_path):
     tf.logging.fatal('File does not exist %s', image_path)
   image_data = gfile.FastGFile(image_path, 'rb').read()
-  bottleneck_values = run_bottleneck_on_image(sess, image_data,
-                                              jpeg_data_tensor,
-                                              bottleneck_tensor)
+  try:
+    bottleneck_values = run_bottleneck_on_image(
+        sess, image_data, jpeg_data_tensor, bottleneck_tensor)
+  except:
+    raise RuntimeError('Error during processing file %s' % image_path)
+
   bottleneck_string = ','.join(str(x) for x in bottleneck_values)
   with open(bottleneck_path, 'w') as bottleneck_file:
     bottleneck_file.write(bottleneck_string)
diff --git a/tensorflow/python/estimator/estimator_test.py b/tensorflow/python/estimator/estimator_test.py
index b8064f0a77..5d9913d734 100644
--- a/tensorflow/python/estimator/estimator_test.py
+++ b/tensorflow/python/estimator/estimator_test.py
@@ -54,6 +54,7 @@ from tensorflow.python.platform import test
 from tensorflow.python.platform import tf_logging as logging
 from tensorflow.python.saved_model import loader
 from tensorflow.python.saved_model import tag_constants
+from tensorflow.python.summary.writer import writer_cache
 from tensorflow.python.training import checkpoint_state_pb2
 from tensorflow.python.training import saver
 from tensorflow.python.training import saver_test_utils
@@ -436,8 +437,12 @@ class EstimatorTrainTest(test.TestCase):
         model_fn=model_fn_global_step_incrementer)
     est1.train(dummy_input_fn, steps=5)
 
+    # We have to clear the cache before we can rename the directory,
+    # otherwise open file handles will prevent the delete on Windows.
+    writer_cache.FileWriterCache.clear()
     model_dir2 = os.path.join(tmpdir, 'model_dir2')
     os.renames(model_dir1, model_dir2)
+
     est2 = estimator.Estimator(
         model_dir=model_dir2,
         model_fn=model_fn_global_step_incrementer)
diff --git a/tensorflow/python/framework/ops.py b/tensorflow/python/framework/ops.py
index 05972022d0..0b04904ec2 100644
--- a/tensorflow/python/framework/ops.py
+++ b/tensorflow/python/framework/ops.py
@@ -2705,11 +2705,11 @@ class Graph(object):
     Args:
       name: The key for the collection. For example, the `GraphKeys` class
         contains many standard names for collections.
-      scope: (Optional.) If supplied, the resulting list is filtered to include
-        only items whose `name` attribute matches using `re.match`. Items
-        without a `name` attribute are never returned if a scope is supplied and
-        the choice or `re.match` means that a `scope` without special tokens
-        filters by prefix.
+      scope: (Optional.) A string. If supplied, the resulting list is filtered
+        to include only items whose `name` attribute matches `scope` using
+        `re.match`. Items without a `name` attribute are never returned if a
+        scope is supplied. The choice of `re.match` means that a `scope` without
+        special tokens filters by prefix.
 
     Returns:
       The list of values in the collection with the given `name`, or
diff --git a/tensorflow/python/kernel_tests/cwise_ops_test.py b/tensorflow/python/kernel_tests/cwise_ops_test.py
index cd0d33ecf3..0846470abc 100644
--- a/tensorflow/python/kernel_tests/cwise_ops_test.py
+++ b/tensorflow/python/kernel_tests/cwise_ops_test.py
@@ -615,6 +615,13 @@ class BinaryOpTest(test.TestCase):
     self._compareBoth(x, y, np.multiply, _MUL)
     self._compareBoth(x, y + 0.1, np.true_divide, _TRUEDIV)
     self._compareBoth(x, y + 0.1, np.floor_divide, _FLOORDIV)
+    self._compareBoth(x, y, np.arctan2, math_ops.atan2)
+    x1 = np.random.randn(5, 6).astype(np.float32)
+    x2 = np.random.randn(5, 6).astype(np.float32)
+    # Remove tiny values--atan2 gradients are flaky near the origin.
+    x1[np.abs(x1) < 0.05] = 0.05 * np.sign(x1[np.abs(x1) < 0.05])
+    x2[np.abs(x2) < 0.05] = 0.05 * np.sign(x2[np.abs(x2) < 0.05])
+    self._compareBoth(x1, x2, np.arctan2, math_ops.atan2)
     try:
       from scipy import special  # pylint: disable=g-import-not-at-top
       a_pos_small = np.linspace(0.1, 2, 15).reshape(1, 3, 5).astype(np.float32)
@@ -672,6 +679,13 @@ class BinaryOpTest(test.TestCase):
     self._compareBoth(x, y, np.multiply, _MUL)
     self._compareBoth(x, y + 0.1, np.true_divide, _TRUEDIV)
     self._compareBoth(x, y + 0.1, np.floor_divide, _FLOORDIV)
+    self._compareBoth(x, y, np.arctan2, math_ops.atan2)
+    x1 = np.random.randn(7, 4).astype(np.float64)
+    x2 = np.random.randn(7, 4).astype(np.float64)
+    # Remove tiny values--atan2 gradients are flaky near the origin.
+    x1[np.abs(x1) < 0.5] = 0.5 * np.sign(x1[np.abs(x1) < 0.5])
+    x2[np.abs(x2) < 0.5] = 0.5 * np.sign(x2[np.abs(x2) < 0.5])
+    self._compareBoth(x1, x2, np.arctan2, math_ops.atan2)
     try:
       from scipy import special  # pylint: disable=g-import-not-at-top
       a_pos_small = np.linspace(0.1, 2, 15).reshape(1, 3, 5).astype(np.float32)
@@ -1090,6 +1104,19 @@ class BinaryOpTest(test.TestCase):
           error = gradient_checker.compute_gradient_error(y, [], z, [])
           self.assertLess(error, 2e-4)
 
+  def testAtan2SpecialValues(self):
+    x1l, x2l = zip((+0.0, +0.0), (+0.0, -0.0), (-0.0, +0.0), (-0.0, -0.0),
+                   (1.2345, float("inf")), (1.2345, -float("inf")),
+                   (-4.321, float("inf")), (-4.125, -float("inf")),
+                   (float("inf"), float("inf")), (float("inf"), -float("inf")),
+                   (-float("inf"), float("inf")), (-float("inf"),
+                                                   -float("inf")))
+    for dtype in np.float32, np.float64:
+      x1 = np.array(x1l).astype(dtype)
+      x2 = np.array(x2l).astype(dtype)
+      self._compareCpu(x1, x2, np.arctan2, math_ops.atan2)
+      self._compareGpu(x1, x2, np.arctan2, math_ops.atan2)
+
 
 class ComparisonOpTest(test.TestCase):
 
diff --git a/tensorflow/python/kernel_tests/tensor_priority_test.py b/tensorflow/python/kernel_tests/tensor_priority_test.py
new file mode 100644
index 0000000000..574538a837
--- /dev/null
+++ b/tensorflow/python/kernel_tests/tensor_priority_test.py
@@ -0,0 +1,86 @@
+# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for the binary ops priority mechanism."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+from tensorflow.python.framework import ops
+from tensorflow.python.platform import test as test_lib
+
+
+class TensorPriorityTest(test_lib.TestCase):
+
+  def testSupportedRhsWithoutDelegation(self):
+
+    class NumpyArraySubclass(np.ndarray):
+      pass
+
+    supported_rhs_without_delegation = (3, 3.0, [1.0, 2.0], np.array(
+        [1.0, 2.0]), NumpyArraySubclass(
+            shape=(1, 2), buffer=np.array([1.0, 2.0])),
+                                        ops.convert_to_tensor([[1.0, 2.0]]))
+    for rhs in supported_rhs_without_delegation:
+      tensor = ops.convert_to_tensor([[10.0, 20.0]])
+      res = tensor + rhs
+      self.assertIsInstance(res, ops.Tensor)
+
+  def testUnsupportedRhsWithoutDelegation(self):
+
+    class WithoutReverseAdd(object):
+      pass
+
+    tensor = ops.convert_to_tensor([[10.0, 20.0]])
+    rhs = WithoutReverseAdd()
+    with self.assertRaisesWithPredicateMatch(
+        TypeError, lambda e: "Expected float" in str(e)):
+      # pylint: disable=pointless-statement
+      tensor + rhs
+
+  def testUnsupportedRhsWithDelegation(self):
+
+    class WithReverseAdd(object):
+
+      def __radd__(self, lhs):
+        return "Works!"
+
+    tensor = ops.convert_to_tensor([[10.0, 20.0]])
+    rhs = WithReverseAdd()
+    res = tensor + rhs
+    self.assertEqual(res, "Works!")
+
+  def testFullDelegationControlUsingRegistry(self):
+
+    class NumpyArraySubclass(np.ndarray):
+
+      def __radd__(self, lhs):
+        return "Works!"
+
+    def raise_to_delegate(value, dtype=None, name=None, as_ref=False):
+      del value, dtype, name, as_ref  # Unused.
+      raise TypeError
+
+    ops.register_tensor_conversion_function(
+        NumpyArraySubclass, raise_to_delegate, priority=0)
+    tensor = ops.convert_to_tensor([[10.0, 20.0]])
+    rhs = NumpyArraySubclass(shape=(1, 2), buffer=np.array([1.0, 2.0]))
+    res = tensor + rhs
+    self.assertEqual(res, "Works!")
+
+
+if __name__ == "__main__":
+  test_lib.main()
diff --git a/tensorflow/python/layers/convolutional.py b/tensorflow/python/layers/convolutional.py
index b2fe9feb44..938161f426 100644
--- a/tensorflow/python/layers/convolutional.py
+++ b/tensorflow/python/layers/convolutional.py
@@ -972,7 +972,7 @@ def separable_conv2d(inputs,
 
 
 class Conv2DTranspose(Conv2D):
-  """Transposed convolution layer (sometimes called Deconvolution).
+  """Transposed 2D convolution layer (sometimes called 2D Deconvolution).
 
   The need for transposed convolutions generally arises
   from the desire to use a transformation going in the opposite direction
@@ -1086,19 +1086,9 @@ class Conv2DTranspose(Conv2D):
     kernel_h, kernel_w = self.kernel_size
     stride_h, stride_w = self.strides
 
-    def get_deconv_dim(dim_size, stride_size, kernel_size, padding):
-      if isinstance(dim_size, ops.Tensor):
-        dim_size = math_ops.multiply(dim_size, stride_size)
-      elif dim_size is not None:
-        dim_size *= stride_size
-
-      if padding == 'valid' and dim_size is not None:
-        dim_size += max(kernel_size - stride_size, 0)
-      return dim_size
-
     # Infer the dynamic output shape:
-    out_height = get_deconv_dim(height, stride_h, kernel_h, self.padding)
-    out_width = get_deconv_dim(width, stride_w, kernel_w, self.padding)
+    out_height = utils.get_deconv_dim(height, stride_h, kernel_h, self.padding)
+    out_width = utils.get_deconv_dim(width, stride_w, kernel_w, self.padding)
 
     if self.data_format == 'channels_first':
       output_shape = (batch_size, self.filters, out_height, out_width)
@@ -1119,10 +1109,10 @@ class Conv2DTranspose(Conv2D):
     # Infer the static output shape:
     out_shape = inputs.get_shape().as_list()
     out_shape[c_axis] = self.filters
-    out_shape[h_axis] = get_deconv_dim(
-        out_shape[h_axis], stride_h, kernel_h, self.padding)
-    out_shape[w_axis] = get_deconv_dim(
-        out_shape[w_axis], stride_w, kernel_w, self.padding)
+    out_shape[h_axis] = utils.get_deconv_dim(out_shape[h_axis], stride_h,
+                                             kernel_h, self.padding)
+    out_shape[w_axis] = utils.get_deconv_dim(out_shape[w_axis], stride_w,
+                                             kernel_w, self.padding)
     outputs.set_shape(out_shape)
 
     if self.bias:
@@ -1152,7 +1142,7 @@ def conv2d_transpose(inputs,
                      trainable=True,
                      name=None,
                      reuse=None):
-  """Transposed convolution layer (sometimes called Deconvolution).
+  """Functional interface for transposed 2D convolution layer.
 
   The need for transposed convolutions generally arises
   from the desire to use a transformation going in the opposite direction
@@ -1177,6 +1167,250 @@ def conv2d_transpose(inputs,
       `channels_last` corresponds to inputs with shape
       `(batch, height, width, channels)` while `channels_first` corresponds to
       inputs with shape `(batch, channels, height, width)`.
+    activation: Activation function. Set it to `None` to maintain a
+      linear activation.
+    use_bias: Boolean, whether the layer uses a bias.
+    kernel_initializer: An initializer for the convolution kernel.
+    bias_initializer: An initializer for the bias vector. If `None`, then no
+      bias will be applied.
+    kernel_regularizer: Optional regularizer for the convolution kernel.
+    bias_regularizer: Optional regularizer for the bias vector.
+    activity_regularizer: Regularizer function for the output.
+    trainable: Boolean, if `True` also add variables to the graph collection
+      `GraphKeys.TRAINABLE_VARIABLES` (see `tf.Variable`).
+    name: A string, the name of the layer.
+    reuse: Boolean, whether to reuse the weights of a previous layer
+      by the same name.
+
+  Returns:
+    Output tensor.
+  """
+  layer = Conv2DTranspose(
+      filters=filters,
+      kernel_size=kernel_size,
+      strides=strides,
+      padding=padding,
+      data_format=data_format,
+      activation=activation,
+      use_bias=use_bias,
+      kernel_initializer=kernel_initializer,
+      bias_initializer=bias_initializer,
+      kernel_regularizer=kernel_regularizer,
+      bias_regularizer=bias_regularizer,
+      activity_regularizer=activity_regularizer,
+      trainable=trainable,
+      name=name,
+      _reuse=reuse,
+      _scope=name)
+  return layer.apply(inputs)
+
+
+class Conv3DTranspose(Conv3D):
+  """Transposed 3D convolution layer (sometimes called 3D Deconvolution).
+
+  Arguments:
+    filters: Integer, the dimensionality of the output space (i.e. the number
+      of filters in the convolution).
+    kernel_size: An integer or tuple/list of 3 integers, specifying the
+      depth, height and width of the 3D convolution window.
+      Can be a single integer to specify the same value for all spatial
+      dimensions.
+    strides: An integer or tuple/list of 3 integers, specifying the strides
+      of the convolution along the depth, height and width.
+      Can be a single integer to specify the same value for all spatial
+      dimensions.
+    padding: One of `"valid"` or `"same"` (case-insensitive).
+    data_format: A string, one of `channels_last` (default) or `channels_first`.
+      The ordering of the dimensions in the inputs.
+      `channels_last` corresponds to inputs with shape
+      `(batch, depth, height, width, channels)` while `channels_first`
+      corresponds to inputs with shape
+      `(batch, channels, depth, height, width)`.
+    activation: Activation function. Set it to `None` to maintain a
+      linear activation.
+    use_bias: Boolean, whether the layer uses a bias.
+    kernel_initializer: An initializer for the convolution kernel.
+    bias_initializer: An initializer for the bias vector. If `None`, then no
+      bias will be applied.
+    kernel_regularizer: Optional regularizer for the convolution kernel.
+    bias_regularizer: Optional regularizer for the bias vector.
+    activity_regularizer: Regularizer function for the output.
+    trainable: Boolean, if `True` also add variables to the graph collection
+      `GraphKeys.TRAINABLE_VARIABLES` (see `tf.Variable`).
+    name: A string, the name of the layer.
+  """
+
+  def __init__(self,
+               filters,
+               kernel_size,
+               strides=(1, 1, 1),
+               padding='valid',
+               data_format='channels_last',
+               activation=None,
+               use_bias=True,
+               kernel_initializer=None,
+               bias_initializer=init_ops.zeros_initializer(),
+               kernel_regularizer=None,
+               bias_regularizer=None,
+               activity_regularizer=None,
+               trainable=True,
+               name=None,
+               **kwargs):
+    super(Conv3DTranspose, self).__init__(
+        filters=filters,
+        kernel_size=kernel_size,
+        strides=strides,
+        padding=padding,
+        data_format=data_format,
+        activation=activation,
+        use_bias=use_bias,
+        kernel_initializer=kernel_initializer,
+        bias_initializer=bias_initializer,
+        kernel_regularizer=kernel_regularizer,
+        bias_regularizer=bias_regularizer,
+        activity_regularizer=activity_regularizer,
+        trainable=trainable,
+        name=name,
+        **kwargs)
+
+  def build(self, input_shape):
+    if len(input_shape) != 5:
+      raise ValueError('Inputs should have rank 5, received input shape:',
+                       str(input_shape))
+    if self.data_format == 'channels_first':
+      channel_axis = 1
+    else:
+      channel_axis = -1
+    if input_shape[channel_axis] is None:
+      raise ValueError('The channel dimension of the inputs '
+                       'should be defined, found None: ' + str(input_shape))
+    input_dim = input_shape[channel_axis]
+    kernel_shape = self.kernel_size + (self.filters, input_dim)
+
+    self.kernel = self.add_variable(
+        'kernel',
+        shape=kernel_shape,
+        initializer=self.kernel_initializer,
+        regularizer=self.kernel_regularizer,
+        trainable=True,
+        dtype=self.dtype)
+    if self.use_bias:
+      self.bias = self.add_variable(
+          'bias',
+          shape=(self.filters,),
+          initializer=self.bias_initializer,
+          regularizer=self.bias_regularizer,
+          trainable=True,
+          dtype=self.dtype)
+    else:
+      self.bias = None
+
+  def call(self, inputs):
+    inputs_shape = array_ops.shape(inputs)
+    batch_size = inputs_shape[0]
+    if self.data_format == 'channels_first':
+      c_axis, d_axis, h_axis, w_axis = 1, 2, 3, 4
+    else:
+      c_axis, d_axis, h_axis, w_axis = 4, 1, 2, 3
+
+    depth = inputs_shape[d_axis]
+    height = inputs_shape[h_axis]
+    width = inputs_shape[w_axis]
+
+    kernel_d, kernel_h, kernel_w = self.kernel_size
+    stride_d, stride_h, stride_w = self.strides
+
+    # Infer the dynamic output shape:
+    out_depth = utils.get_deconv_dim(depth, stride_d, kernel_d, self.padding)
+    out_height = utils.get_deconv_dim(height, stride_h, kernel_h, self.padding)
+    out_width = utils.get_deconv_dim(width, stride_w, kernel_w, self.padding)
+
+    if self.data_format == 'channels_first':
+      output_shape = (batch_size, self.filters, out_depth, out_height,
+                      out_width)
+      strides = (1, 1, stride_d, stride_h, stride_w)
+    else:
+      output_shape = (batch_size, out_depth, out_height, out_width,
+                      self.filters)
+      strides = (1, stride_d, stride_h, stride_w, 1)
+
+    output_shape_tensor = array_ops.stack(output_shape)
+    outputs = nn.conv3d_transpose(
+        inputs,
+        self.kernel,
+        output_shape_tensor,
+        strides,
+        data_format=utils.convert_data_format(self.data_format, ndim=5),
+        padding=self.padding.upper())
+
+    # Infer the static output shape:
+    out_shape = inputs.get_shape().as_list()
+    out_shape[c_axis] = self.filters
+    out_shape[d_axis] = utils.get_deconv_dim(out_shape[d_axis], stride_d,
+                                             kernel_d, self.padding)
+    out_shape[h_axis] = utils.get_deconv_dim(out_shape[h_axis], stride_h,
+                                             kernel_h, self.padding)
+    out_shape[w_axis] = utils.get_deconv_dim(out_shape[w_axis], stride_w,
+                                             kernel_w, self.padding)
+    outputs.set_shape(out_shape)
+
+    if self.bias:
+      outputs_shape = outputs.shape.as_list()
+      if self.data_format == 'channels_first':
+        outputs_4d = array_ops.reshape(outputs, [
+            outputs_shape[0], outputs_shape[1],
+            outputs_shape[2] * outputs_shape[3], outputs_shape[4]
+        ])
+      else:
+        outputs_4d = array_ops.reshape(outputs, [
+            outputs_shape[0], outputs_shape[1] * outputs_shape[2],
+            outputs_shape[3], outputs_shape[4]
+        ])
+      outputs_4d = nn.bias_add(
+          outputs_4d,
+          self.bias,
+          data_format=utils.convert_data_format(self.data_format, ndim=4))
+      outputs = array_ops.reshape(outputs_4d, outputs_shape)
+
+    if self.activation is not None:
+      return self.activation(outputs)
+    return outputs
+
+
+def conv3d_transpose(inputs,
+                     filters,
+                     kernel_size,
+                     strides=(1, 1, 1),
+                     padding='valid',
+                     data_format='channels_last',
+                     activation=None,
+                     use_bias=True,
+                     kernel_initializer=None,
+                     bias_initializer=init_ops.zeros_initializer(),
+                     kernel_regularizer=None,
+                     bias_regularizer=None,
+                     activity_regularizer=None,
+                     trainable=True,
+                     name=None,
+                     reuse=None):
+  """Functional interface for transposed 3D convolution layer.
+
+  Arguments:
+    inputs: Input tensor.
+    filters: Integer, the dimensionality of the output space (i.e. the number
+      of filters in the convolution).
+    kernel_size: A tuple or list of 3 positive integers specifying the spatial
+      dimensions of of the filters. Can be a single integer to specify the same
+      value for all spatial dimensions.
+    strides: A tuple or list of 3 positive integers specifying the strides
+      of the convolution. Can be a single integer to specify the same value for
+      all spatial dimensions.
+    padding: one of `"valid"` or `"same"` (case-insensitive).
+    data_format: A string, one of `channels_last` (default) or `channels_first`.
+      The ordering of the dimensions in the inputs.
+      `channels_last` corresponds to inputs with shape
+      `(batch, height, width, channels)` while `channels_first` corresponds to
+      inputs with shape `(batch, channels, height, width)`.
     activation: Activation function. Set it to None to maintain a
       linear activation.
     use_bias: Boolean, whether the layer uses a bias.
@@ -1195,7 +1429,7 @@ def conv2d_transpose(inputs,
   Returns:
     Output tensor.
   """
-  layer = Conv2DTranspose(
+  layer = Conv3DTranspose(
       filters=filters,
       kernel_size=kernel_size,
       strides=strides,
@@ -1222,8 +1456,10 @@ Convolution2D = Conv2D
 Convolution3D = Conv3D
 SeparableConvolution2D = SeparableConv2D
 Convolution2DTranspose = Deconvolution2D = Deconv2D = Conv2DTranspose
+Convolution3DTranspose = Deconvolution3D = Deconv3D = Conv3DTranspose
 convolution1d = conv1d
 convolution2d = conv2d
 convolution3d = conv3d
 separable_convolution2d = separable_conv2d
 convolution2d_transpose = deconvolution2d = deconv2d = conv2d_transpose
+convolution3d_transpose = deconvolution3d = deconv3d = conv3d_transpose
diff --git a/tensorflow/python/layers/convolutional_test.py b/tensorflow/python/layers/convolutional_test.py
index da962b2f99..42a2d77534 100644
--- a/tensorflow/python/layers/convolutional_test.py
+++ b/tensorflow/python/layers/convolutional_test.py
@@ -651,5 +651,174 @@ class Conv2DTransposeTest(test.TestCase):
     self.assertEqual(len(variables.trainable_variables()), 4)
 
 
+class Conv3DTransposeTest(test.TestCase):
+
+  def testInvalidDataFormat(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 32), seed=1)
+    with self.assertRaisesRegexp(ValueError, 'data_format'):
+      conv_layers.conv3d_transpose(volumes, 4, 3, data_format='invalid')
+
+  def testInvalidStrides(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 32), seed=1)
+    with self.assertRaisesRegexp(ValueError, 'strides'):
+      conv_layers.conv3d_transpose(volumes, 4, 3, strides=(1, 2))
+
+    with self.assertRaisesRegexp(ValueError, 'strides'):
+      conv_layers.conv3d_transpose(volumes, 4, 3, strides=None)
+
+  def testInvalidKernelSize(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 32), seed=1)
+    with self.assertRaisesRegexp(ValueError, 'kernel_size'):
+      conv_layers.conv3d_transpose(volumes, 4, (1, 2))
+
+    with self.assertRaisesRegexp(ValueError, 'kernel_size'):
+      conv_layers.conv3d_transpose(volumes, 4, None)
+
+  def testCreateConv3DTranspose(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 32))
+    layer = conv_layers.Conv3DTranspose(4, [3, 3, 3], activation=nn_ops.relu)
+    output = layer.apply(volumes)
+    self.assertEqual(output.op.name, 'conv3d_transpose/Relu')
+    self.assertListEqual(output.get_shape().as_list(),
+                         [5, depth + 2, height + 2, width + 2, 4])
+    self.assertListEqual(layer.kernel.get_shape().as_list(), [3, 3, 3, 4, 32])
+    self.assertListEqual(layer.bias.get_shape().as_list(), [4])
+
+  def testCreateConv3DTransposeIntegerKernelSize(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 32))
+    layer = conv_layers.Conv3DTranspose(4, 3)
+    output = layer.apply(volumes)
+    self.assertListEqual(output.get_shape().as_list(),
+                         [5, depth + 2, height + 2, width + 2, 4])
+    self.assertListEqual(layer.kernel.get_shape().as_list(), [3, 3, 3, 4, 32])
+    self.assertListEqual(layer.bias.get_shape().as_list(), [4])
+
+  def testCreateConv3DTransposeChannelsFirst(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, 32, depth, height, width))
+    layer = conv_layers.Conv3DTranspose(
+        4, [3, 3, 3], data_format='channels_first')
+    output = layer.apply(volumes)
+    self.assertListEqual(output.get_shape().as_list(),
+                         [5, 4, depth + 2, height + 2, width + 2])
+    self.assertListEqual(layer.kernel.get_shape().as_list(), [3, 3, 3, 4, 32])
+    self.assertListEqual(layer.bias.get_shape().as_list(), [4])
+
+  def testConv3DTransposePaddingSame(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 64), seed=1)
+    layer = conv_layers.Conv3DTranspose(
+        32, volumes.get_shape()[1:4], padding='same')
+    output = layer.apply(volumes)
+    self.assertListEqual(output.get_shape().as_list(),
+                         [5, depth, height, width, 32])
+
+  def testCreateConv3DTransposeWithStrides(self):
+    depth, height, width = 4, 6, 8
+    # Test strides tuple.
+    volumes = random_ops.random_uniform((5, depth, height, width, 32), seed=1)
+    layer = conv_layers.Conv3DTranspose(
+        4, [3, 3, 3], strides=(2, 2, 2), padding='same')
+    output = layer.apply(volumes)
+    self.assertListEqual(output.get_shape().as_list(),
+                         [5, depth * 2, height * 2, width * 2, 4])
+
+    # Test strides integer.
+    layer = conv_layers.Conv3DTranspose(4, [3, 3, 3], strides=2, padding='same')
+    output = layer.apply(volumes)
+    self.assertListEqual(output.get_shape().as_list(),
+                         [5, depth * 2, height * 2, width * 2, 4])
+
+    # Test unequal strides.
+    layer = conv_layers.Conv3DTranspose(
+        4, [3, 3, 3], strides=(2, 1, 1), padding='same')
+    output = layer.apply(volumes)
+    self.assertListEqual(output.get_shape().as_list(),
+                         [5, depth * 2, height, width, 4])
+
+  def testConv3DTransposeKernelRegularizer(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 32))
+    reg = lambda x: 0.1 * math_ops.reduce_sum(x)
+    layer = conv_layers.Conv3DTranspose(4, [3, 3, 3], kernel_regularizer=reg)
+    layer.apply(volumes)
+    loss_keys = ops.get_collection(ops.GraphKeys.REGULARIZATION_LOSSES)
+    self.assertEqual(len(loss_keys), 1)
+    self.assertListEqual(layer.losses, loss_keys)
+
+  def testConv3DTransposeBiasRegularizer(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 32))
+    reg = lambda x: 0.1 * math_ops.reduce_sum(x)
+    layer = conv_layers.Conv3DTranspose(4, [3, 3, 3], bias_regularizer=reg)
+    layer.apply(volumes)
+    loss_keys = ops.get_collection(ops.GraphKeys.REGULARIZATION_LOSSES)
+    self.assertEqual(len(loss_keys), 1)
+    self.assertListEqual(layer.losses, loss_keys)
+
+  def testConv3DTransposeNoBias(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 32))
+    layer = conv_layers.Conv3DTranspose(
+        4, [3, 3, 3], activation=nn_ops.relu, use_bias=False)
+    output = layer.apply(volumes)
+    self.assertEqual(output.op.name, 'conv3d_transpose/Relu')
+    self.assertListEqual(output.get_shape().as_list(),
+                         [5, depth + 2, height + 2, width + 2, 4])
+    self.assertListEqual(layer.kernel.get_shape().as_list(), [3, 3, 3, 4, 32])
+    self.assertEqual(layer.bias, None)
+
+  def testFunctionalConv3DTransposeReuse(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 32), seed=1)
+    conv_layers.conv3d_transpose(volumes, 4, [3, 3, 3], name='deconv1')
+    self.assertEqual(len(variables.trainable_variables()), 2)
+    conv_layers.conv3d_transpose(
+        volumes, 4, [3, 3, 3], name='deconv1', reuse=True)
+    self.assertEqual(len(variables.trainable_variables()), 2)
+
+  def testFunctionalConv3DTransposeReuseFromScope(self):
+    with variable_scope.variable_scope('scope'):
+      depth, height, width = 5, 7, 9
+      volumes = random_ops.random_uniform((5, depth, height, width, 32), seed=1)
+      conv_layers.conv3d_transpose(volumes, 4, [3, 3, 3], name='deconv1')
+      self.assertEqual(len(variables.trainable_variables()), 2)
+    with variable_scope.variable_scope('scope', reuse=True):
+      conv_layers.conv3d_transpose(volumes, 4, [3, 3, 3], name='deconv1')
+      self.assertEqual(len(variables.trainable_variables()), 2)
+
+  def testFunctionalConv3DTransposeInitializerFromScope(self):
+    with self.test_session() as sess:
+      with variable_scope.variable_scope(
+          'scope', initializer=init_ops.ones_initializer()):
+        depth, height, width = 5, 7, 9
+        volumes = random_ops.random_uniform(
+            (5, depth, height, width, 32), seed=1)
+        conv_layers.conv3d_transpose(volumes, 4, [3, 3, 3], name='deconv1')
+        weights = variables.trainable_variables()
+        # Check the names of weights in order.
+        self.assertTrue('kernel' in weights[0].name)
+        self.assertTrue('bias' in weights[1].name)
+        sess.run(variables.global_variables_initializer())
+        weights = sess.run(weights)
+        # Check that the kernel weights got initialized to ones (from scope)
+        self.assertAllClose(weights[0], np.ones((3, 3, 3, 4, 32)))
+        # Check that the bias still got initialized to zeros.
+        self.assertAllClose(weights[1], np.zeros((4)))
+
+  def testFunctionalConv3DTransposeNoReuse(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 32), seed=1)
+    conv_layers.conv3d_transpose(volumes, 4, [3, 3, 3])
+    self.assertEqual(len(variables.trainable_variables()), 2)
+    conv_layers.conv3d_transpose(volumes, 4, [3, 3, 3])
+    self.assertEqual(len(variables.trainable_variables()), 4)
+
+
 if __name__ == '__main__':
   test.main()
diff --git a/tensorflow/python/layers/layers.py b/tensorflow/python/layers/layers.py
index 9f02757d5b..aa46eb5d27 100644
--- a/tensorflow/python/layers/layers.py
+++ b/tensorflow/python/layers/layers.py
@@ -23,6 +23,7 @@
 @@conv3d
 @@separable_conv2d
 @@conv2d_transpose
+@@conv3d_transpose
 @@average_pooling1d
 @@max_pooling1d
 @@average_pooling2d
@@ -50,6 +51,7 @@ from tensorflow.python.layers.convolutional import conv2d
 from tensorflow.python.layers.convolutional import conv3d
 from tensorflow.python.layers.convolutional import separable_conv2d
 from tensorflow.python.layers.convolutional import conv2d_transpose
+from tensorflow.python.layers.convolutional import conv3d_transpose
 
 # Pooling layers.
 from tensorflow.python.layers.pooling import average_pooling1d
diff --git a/tensorflow/python/layers/utils.py b/tensorflow/python/layers/utils.py
index 666d475690..64b948c70f 100644
--- a/tensorflow/python/layers/utils.py
+++ b/tensorflow/python/layers/utils.py
@@ -26,6 +26,7 @@ import numpy as np
 
 from tensorflow.python.ops import variables
 from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import math_ops
 from tensorflow.python.framework import ops
 from tensorflow.python.framework import tensor_util
 
@@ -164,3 +165,28 @@ def constant_value(pred):
   else:
     raise TypeError('`pred` must be a Tensor, a Variable, or a Python bool.')
   return pred_value
+
+
+def get_deconv_dim(dim_size, stride_size, kernel_size, padding):
+  """Return output dimension of a deconv layer, based on input dimension.
+
+  Arguments:
+    dim_size: An int representing size of dimension, can be height, width
+      or depth.
+    stride_size: An int representing the stride of deconvolution filters
+      along the same dimension.
+    kernel_size: An int representing size of deconv kernel (filter) along
+      the same dimension.
+    padding: one of `"valid"` or `"same"` (case-insensitive).
+
+  Returns:
+    An int representing the size of output dimension of the layer.
+  """
+  if isinstance(dim_size, ops.Tensor):
+    dim_size = math_ops.multiply(dim_size, stride_size)
+  elif dim_size is not None:
+    dim_size *= stride_size
+
+  if padding == 'valid' and dim_size is not None:
+    dim_size += max(kernel_size - stride_size, 0)
+  return dim_size
diff --git a/tensorflow/python/layers/utils_test.py b/tensorflow/python/layers/utils_test.py
index 54e757c112..7969e957d8 100644
--- a/tensorflow/python/layers/utils_test.py
+++ b/tensorflow/python/layers/utils_test.py
@@ -62,6 +62,13 @@ class ConvUtilsTest(test.TestCase):
     with self.assertRaises(ValueError):
       utils.normalize_padding('invalid')
 
+  def testGetDeconvDim(self):
+    self.assertEqual(utils.get_deconv_dim(30, 1, 3, 'valid'), 32)
+    self.assertEqual(utils.get_deconv_dim(28, 1, 5, 'valid'), 32)
+    self.assertEqual(utils.get_deconv_dim(28, 2, 5, 'valid'), 59)
+    self.assertEqual(utils.get_deconv_dim(32, 1, 3, 'same'), 32)
+    self.assertEqual(utils.get_deconv_dim(32, 1, 5, 'same'), 32)
+    self.assertEqual(utils.get_deconv_dim(32, 2, 5, 'same'), 64)
 
 if __name__ == '__main__':
   test.main()
diff --git a/tensorflow/python/ops/array_ops.py b/tensorflow/python/ops/array_ops.py
index 77db63dd0f..5ac630c321 100644
--- a/tensorflow/python/ops/array_ops.py
+++ b/tensorflow/python/ops/array_ops.py
@@ -470,7 +470,10 @@ def _SliceHelper(tensor, slice_spec, var=None):
     else:
       begin.append(s)
       end.append(s + 1)
-      strides.append(1)
+      if isinstance(s, ops.Tensor):
+        strides.append(constant(1, s.dtype))
+      else:
+        strides.append(np.ones_like(s).dtype.type(1))
       shrink_axis_mask |= (1 << index)
     index += 1
 
diff --git a/tensorflow/python/ops/ctc_ops.py b/tensorflow/python/ops/ctc_ops.py
index b0a1fc3dd1..ee5b2952f4 100644
--- a/tensorflow/python/ops/ctc_ops.py
+++ b/tensorflow/python/ops/ctc_ops.py
@@ -197,7 +197,7 @@ def ctc_greedy_decoder(inputs, sequence_length, merge_repeated=True):
     merge_repeated: Boolean.  Default: True.
 
   Returns:
-    A tuple `(decoded, log_probabilities)` where
+    A tuple `(decoded, neg_sum_logits)` where
     decoded: A single-element list. `decoded[0]`
       is an `SparseTensor` containing the decoded outputs s.t.:
       `decoded.indices`: Indices matrix `(total_decoded_outputs x 2)`.
@@ -206,8 +206,9 @@ def ctc_greedy_decoder(inputs, sequence_length, merge_repeated=True):
         The vector stores the decoded classes.
       `decoded.shape`: Shape vector, size `(2)`.
         The shape values are: `[batch_size, max_decoded_length]`
-    log_probability: A `float` matrix `(batch_size x 1)` containing sequence
-        log-probabilities.
+    neg_sum_logits: A `float` matrix `(batch_size x 1)` containing, for the
+        sequence found, the negative of the sum of the greatest logit at each
+        timeframe.
   """
   outputs = gen_ctc_ops._ctc_greedy_decoder(
       inputs, sequence_length, merge_repeated=merge_repeated)
diff --git a/tensorflow/python/ops/init_ops.py b/tensorflow/python/ops/init_ops.py
index 67fff9c803..42b4f952bb 100644
--- a/tensorflow/python/ops/init_ops.py
+++ b/tensorflow/python/ops/init_ops.py
@@ -39,6 +39,7 @@ from tensorflow.python.framework import constant_op
 from tensorflow.python.framework import dtypes
 from tensorflow.python.ops import array_ops
 from tensorflow.python.ops import linalg_ops
+from tensorflow.python.ops import math_ops
 from tensorflow.python.ops import random_ops
 
 
@@ -487,16 +488,18 @@ class Orthogonal(Initializer):
     flat_shape = (num_rows, num_cols)
 
     # Generate a random matrix
-    a = random_ops.random_uniform(flat_shape, dtype=dtype, seed=self.seed)
-    # Compute the svd
-    _, u, v = linalg_ops.svd(a, full_matrices=False)
-    # Pick the appropriate singular value decomposition
-    if num_rows > num_cols:
-      q = u
-    else:
-      # Tensorflow departs from numpy conventions
-      # such that we need to transpose axes here
-      q = array_ops.transpose(v)
+    a = random_ops.random_normal(flat_shape, dtype=dtype, seed=self.seed)
+    # Compute the qr factorization
+    q, r = linalg_ops.qr(a, full_matrices=False)
+    # Make Q uniform
+    square_len = math_ops.minimum(num_rows, num_cols)
+    d = array_ops.diag_part(r[:square_len, :square_len])
+    ph = d / math_ops.abs(d)
+    q *= ph
+    # Pad zeros to Q (if rows smaller than cols)
+    if num_rows < num_cols:
+      padding = array_ops.zeros([num_rows, num_cols - num_rows], dtype=dtype)
+      q = array_ops.concat([q, padding], 1)
     return self.gain * array_ops.reshape(q, shape)
 
   def get_config(self):
diff --git a/tensorflow/python/ops/logging_ops.py b/tensorflow/python/ops/logging_ops.py
index 9c49ef7889..dbb13e0c8f 100644
--- a/tensorflow/python/ops/logging_ops.py
+++ b/tensorflow/python/ops/logging_ops.py
@@ -124,7 +124,7 @@ def image_summary(tag, tensor, max_images=3, collections=None, name=None):
   """Outputs a `Summary` protocol buffer with images.
 
   For an explanation of why this op was deprecated, and information on how to
-  migrate, look ['here'](https://www.tensorflow.org/code/tensorflow/contrib/deprecated/__init__.py)
+  migrate, look ['here'](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/deprecated/__init__.py)
 
   The summary has up to `max_images` summary values containing images. The
   images are built from `tensor` which must be 4-D with shape `[batch_size,
diff --git a/tensorflow/python/ops/math_grad.py b/tensorflow/python/ops/math_grad.py
index 409e3c5111..024158e709 100644
--- a/tensorflow/python/ops/math_grad.py
+++ b/tensorflow/python/ops/math_grad.py
@@ -613,6 +613,16 @@ def _AtanGrad(op, grad):
     return grad * inv
 
 
+@ops.RegisterGradient("Atan2")
+def _Atan2Grad(op, grad):
+  """Returns grad * x / (x^2 + y^2), grad * -y / (x^2 + y^2)."""
+  y = op.inputs[0]
+  x = op.inputs[1]
+  with ops.control_dependencies([grad.op]):
+    grad_inv = grad / (math_ops.square(x) + math_ops.square(y))
+    return x * grad_inv, -y * grad_inv
+
+
 @ops.RegisterGradient("AddN")
 def _AddNGrad(op, grad):
   """Copies the gradient to all inputs."""
diff --git a/tensorflow/python/ops/math_ops.py b/tensorflow/python/ops/math_ops.py
index 92e8723ced..b762205cf7 100644
--- a/tensorflow/python/ops/math_ops.py
+++ b/tensorflow/python/ops/math_ops.py
@@ -56,6 +56,7 @@ See the @{$python/math_ops} guide.
 @@acos
 @@asin
 @@atan
+@@atan2
 @@lgamma
 @@digamma
 @@erf
@@ -824,7 +825,16 @@ def _OverrideBinaryOperatorHelper(func, op_name, clazz_object=ops.Tensor):
   def binary_op_wrapper(x, y):
     with ops.name_scope(None, op_name, [x, y]) as name:
       if not isinstance(y, sparse_tensor.SparseTensor):
-        y = ops.convert_to_tensor(y, dtype=x.dtype.base_dtype, name="y")
+        try:
+          y = ops.convert_to_tensor(y, dtype=x.dtype.base_dtype, name="y")
+        except TypeError:
+          # If the RHS is not a tensor, it might be a tensor aware object
+          # that can implement the operator with knowledge of itself
+          # and the tensor.
+          if hasattr(type(y), "__r%s__" % op_name):
+            return NotImplemented
+          else:
+            raise
       return func(x, y, name=name)
 
   def binary_op_wrapper_sparse(sp_x, y):
diff --git a/tensorflow/python/ops/nn_impl.py b/tensorflow/python/ops/nn_impl.py
index 7c17cf2cb6..0a00e3d765 100644
--- a/tensorflow/python/ops/nn_impl.py
+++ b/tensorflow/python/ops/nn_impl.py
@@ -301,9 +301,8 @@ def zero_fraction(value, name=None):
   This is useful in summaries to measure and report sparsity.  For example,
 
   ```python
-      z = tf.Relu(...)
-      summ = tf.contrib.deprecated.scalar_summary('sparsity',
-      tf.nn.zero_fraction(z))
+      z = tf.nn.relu(...)
+      summ = tf.summary.scalar('sparsity', tf.nn.zero_fraction(z))
   ```
 
   Args:
diff --git a/tensorflow/python/ops/nn_ops.py b/tensorflow/python/ops/nn_ops.py
index ccce9402c7..61fda3a798 100644
--- a/tensorflow/python/ops/nn_ops.py
+++ b/tensorflow/python/ops/nn_ops.py
@@ -840,6 +840,11 @@ def pool(input,  # pylint: disable=redefined-builtin
 def atrous_conv2d(value, filters, rate, padding, name=None):
   """Atrous convolution (a.k.a. convolution with holes or dilated convolution).
 
+  This function is a simpler wrapper around the more general
+  @{tf.nn.convolution}, and exists only for backwards compatibility. You can
+  use @{tf.nn.convolution} to perform 1-D, 2-D, or 3-D atrous convolution.
+
+
   Computes a 2-D atrous convolution, also known as convolution with holes or
   dilated convolution, given 4-D `value` and `filters` tensors. If the `rate`
   parameter is equal to one, it performs regular 2-D convolution. If the `rate`
@@ -959,93 +964,12 @@ def atrous_conv2d(value, filters, rate, padding, name=None):
     ValueError: If input/output depth does not match `filters`' shape, or if
       padding is other than `'VALID'` or `'SAME'`.
   """
-  with ops.name_scope(name, "atrous_conv2d", [value, filters]) as name:
-    value = ops.convert_to_tensor(value, name="value")
-    filters = ops.convert_to_tensor(filters, name="filters")
-    if not value.get_shape()[3].is_compatible_with(filters.get_shape()[2]):
-      raise ValueError(
-          "value's input channels does not match filters' input channels, "
-          "{} != {}".format(value.get_shape()[3], filters.get_shape()[2]))
-    if rate < 1:
-      raise ValueError("rate {} cannot be less than one".format(rate))
-
-    if rate == 1:
-      value = gen_nn_ops.conv2d(input=value,
-                                filter=filters,
-                                strides=[1, 1, 1, 1],
-                                padding=padding)
-      return value
-
-    # We have two padding contributions. The first is used for converting "SAME"
-    # to "VALID". The second is required so that the height and width of the
-    # zero-padded value tensor are multiples of rate.
-
-    # Padding required to reduce to "VALID" convolution
-    if padding == "SAME":
-      # Handle filters whose shape is unknown during graph creation.
-      if filters.get_shape().is_fully_defined():
-        filter_shape = filters.get_shape().as_list()
-      else:
-        filter_shape = array_ops.shape(filters)
-      filter_height, filter_width = filter_shape[0], filter_shape[1]
-
-      # Spatial dimensions of the filters and the upsampled filters in which we
-      # introduce (rate - 1) zeros between consecutive filter values.
-      filter_height_up = filter_height + (filter_height - 1) * (rate - 1)
-      filter_width_up = filter_width + (filter_width - 1) * (rate - 1)
-
-      pad_height = filter_height_up - 1
-      pad_width = filter_width_up - 1
-
-      # When pad_height (pad_width) is odd, we pad more to bottom (right),
-      # following the same convention as conv2d().
-      pad_top = pad_height // 2
-      pad_bottom = pad_height - pad_top
-      pad_left = pad_width // 2
-      pad_right = pad_width - pad_left
-    elif padding == "VALID":
-      pad_top = 0
-      pad_bottom = 0
-      pad_left = 0
-      pad_right = 0
-    else:
-      raise ValueError("Invalid padding")
-
-    # Handle input whose shape is unknown during graph creation.
-    if value.get_shape().is_fully_defined():
-      value_shape = value.get_shape().as_list()
-    else:
-      value_shape = array_ops.shape(value)
-
-    in_height = value_shape[1] + pad_top + pad_bottom
-    in_width = value_shape[2] + pad_left + pad_right
-
-    # More padding so that rate divides the height and width of the input.
-    pad_bottom_extra = (rate - in_height % rate) % rate
-    pad_right_extra = (rate - in_width % rate) % rate
-
-    # The paddings argument to space_to_batch includes both padding components.
-    space_to_batch_pad = [[pad_top, pad_bottom + pad_bottom_extra],
-                          [pad_left, pad_right + pad_right_extra]]
-
-    value = array_ops.space_to_batch(input=value,
-                                     paddings=space_to_batch_pad,
-                                     block_size=rate)
-
-    value = gen_nn_ops.conv2d(input=value,
-                              filter=filters,
-                              strides=[1, 1, 1, 1],
-                              padding="VALID",
-                              name=name)
-
-    # The crops argument to batch_to_space is just the extra padding component.
-    batch_to_space_crop = [[0, pad_bottom_extra], [0, pad_right_extra]]
-
-    value = array_ops.batch_to_space(input=value,
-                                     crops=batch_to_space_crop,
-                                     block_size=rate)
-
-    return value
+  return convolution(
+      input=value,
+      filter=filters,
+      padding=padding,
+      dilation_rate=np.broadcast_to(rate, (2,)),
+      name=name)
 
 
 def conv2d_transpose(value,
@@ -1272,7 +1196,7 @@ def conv3d_transpose(value,
                      output_shape,
                      strides,
                      padding="SAME",
-                     data_format=None,
+                     data_format="NDHWC",
                      name=None):
   """The transpose of `conv3d`.
 
@@ -1308,10 +1232,11 @@ def conv3d_transpose(value,
                       [value, filter, output_shape]) as name:
     value = ops.convert_to_tensor(value, name="value")
     filter = ops.convert_to_tensor(filter, name="filter")
-    if not value.get_shape()[4].is_compatible_with(filter.get_shape()[4]):
+    axis = 1 if data_format == "NCDHW" else 4
+    if not value.get_shape()[axis].is_compatible_with(filter.get_shape()[4]):
       raise ValueError("input channels does not match filter's input channels, "
-                       "{} != {}".format(value.get_shape()[4], filter.get_shape(
-                       )[4]))
+                       "{} != {}".format(value.get_shape()[axis],
+                                         filter.get_shape()[4]))
 
     output_shape_ = ops.convert_to_tensor(output_shape, name="output_shape")
     if not output_shape_.get_shape().is_compatible_with(tensor_shape.vector(5)):
diff --git a/tensorflow/python/ops/parsing_ops.py b/tensorflow/python/ops/parsing_ops.py
index b1994acdc4..796ea20eb7 100644
--- a/tensorflow/python/ops/parsing_ops.py
+++ b/tensorflow/python/ops/parsing_ops.py
@@ -845,7 +845,7 @@ def parse_single_sequence_example(
   Parses a single serialized [`SequenceExample`](https://www.tensorflow.org/code/tensorflow/core/example/example.proto)
   proto given in `serialized`.
 
-  This op parses a serialize sequence example into a tuple of dictionaries
+  This op parses a serialized sequence example into a tuple of dictionaries
   mapping keys to `Tensor` and `SparseTensor` objects respectively.
   The first dictionary contains mappings for keys appearing in
   `context_features`, and the second dictionary contains mappings for keys
diff --git a/tensorflow/python/ops/random_ops.py b/tensorflow/python/ops/random_ops.py
index 19689622b1..f06980783a 100644
--- a/tensorflow/python/ops/random_ops.py
+++ b/tensorflow/python/ops/random_ops.py
@@ -324,7 +324,7 @@ def multinomial(logits, num_samples, seed=None, name=None):
 
   Args:
     logits: 2-D Tensor with shape `[batch_size, num_classes]`.  Each slice
-      `[i, :]` represents the unnormalized log probabilities for all classes.
+      `[i, :]` represents the log-odds for all classes.
     num_samples: 0-D.  Number of independent samples to draw for each row slice.
     seed: A Python integer. Used to create a random seed for the distribution.
       See
diff --git a/tensorflow/python/summary/writer/writer_cache.py b/tensorflow/python/summary/writer/writer_cache.py
index 21870e788e..bad289303c 100644
--- a/tensorflow/python/summary/writer/writer_cache.py
+++ b/tensorflow/python/summary/writer/writer_cache.py
@@ -39,6 +39,10 @@ class FileWriterCache(object):
   def clear():
     """Clear cached summary writers. Currently only used for unit tests."""
     with FileWriterCache._lock:
+      # Make sure all the writers are closed now (otherwise open file handles
+      # may hang around, blocking deletions on Windows).
+      for item in FileWriterCache._cache.values():
+        item.close()
       FileWriterCache._cache = {}
 
   @staticmethod
diff --git a/tensorflow/python/tools/import_pb_to_tensorboard.py b/tensorflow/python/tools/import_pb_to_tensorboard.py
new file mode 100644
index 0000000000..caeb04a24b
--- /dev/null
+++ b/tensorflow/python/tools/import_pb_to_tensorboard.py
@@ -0,0 +1,50 @@
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ================================
+"""Imports a protobuf model as a graph in Tensorboard."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from tensorflow.core.framework import graph_pb2
+from tensorflow.python.client import session
+from tensorflow.python.framework import importer
+from tensorflow.python.framework import ops
+from tensorflow.python.platform import gfile
+from tensorflow.python.summary import summary
+
+
+def import_to_tensorboard(model_dir, log_dir):
+  """View an imported protobuf model (`.pb` file) as a graph in Tensorboard.
+
+  Args:
+    model_dir: The location of the protobuf (`pb`) model to visualize
+    log_dir: The location for the Tensorboard log to begin visualisation from.
+
+  Usage:
+    Call this function with your model location and desired log directory.
+    Launch Tensorboard by pointing it to the log directory.
+    View your imported `.pb` model as a graph.
+  """
+  with session.Session(graph=ops.Graph()) as sess:
+    with gfile.FastGFile(model_dir, "rb") as f:
+      graph_def = graph_pb2.GraphDef()
+      graph_def.ParseFromString(f.read())
+      importer.import_graph_def(graph_def)
+
+    pb_visual_writer = summary.FileWriter(log_dir)
+    pb_visual_writer.add_graph(sess.graph)
+    print("Model Imported. Visualize by running: "
+          "> tensorboard --logdir={}".format(log_dir))
diff --git a/tensorflow/python/training/checkpoint_utils.py b/tensorflow/python/training/checkpoint_utils.py
index 35d1a069d8..d52cf9a436 100644
--- a/tensorflow/python/training/checkpoint_utils.py
+++ b/tensorflow/python/training/checkpoint_utils.py
@@ -59,14 +59,14 @@ def load_checkpoint(ckpt_dir_or_file):
 
 
 def load_variable(ckpt_dir_or_file, name):
-  """Returns a tensor with the contents of the given variable in the checkpoint.
+  """Returns the tensor value of the given variable in the checkpoint.
 
   Args:
     ckpt_dir_or_file: Directory with checkpoints file or path to checkpoint.
-    name: Name of the tensor to return.
+    name: Name of the variable to return.
 
   Returns:
-    `Tensor` object.
+    A numpy `ndarray` with a copy of the value of this variable.
   """
   # TODO(b/29227106): Fix this in the right place and remove this.
   if name.endswith(":0"):
@@ -210,9 +210,8 @@ def init_from_checkpoint(ckpt_dir_or_file, assignment_map):
       else:
         var_name = ",".join([v.name for v in var])
       _set_variable_or_list_initializer(var, ckpt_file, tensor_name_in_ckpt)
-      logging.info("Initialize variable %s from checkpoint %s with %s" % (
-          var_name, ckpt_dir_or_file, tensor_name_in_ckpt
-      ))
+      logging.info("Initialize variable %s from checkpoint %s with %s",
+                   var_name, ckpt_dir_or_file, tensor_name_in_ckpt)
     else:
       scopes = ""
       # TODO(vihanjain): Support list of 'current_var_or_name' here.
@@ -250,9 +249,8 @@ def init_from_checkpoint(ckpt_dir_or_file, assignment_map):
         if var is None:
           var = _collect_partitioned_variable(var_name, store_vars)
         _set_variable_or_list_initializer(var, ckpt_file, full_tensor_name)
-        logging.info("Initialize variable %s from checkpoint %s with %s" % (
-            var_name, ckpt_dir_or_file, full_tensor_name
-        ))
+        logging.info("Initialize variable %s from checkpoint %s with %s",
+                     var_name, ckpt_dir_or_file, full_tensor_name)
 
 
 def _get_checkpoint_filename(ckpt_dir_or_file):
diff --git a/tensorflow/python/training/input.py b/tensorflow/python/training/input.py
index 83f1ac4abd..e9fe9215ae 100644
--- a/tensorflow/python/training/input.py
+++ b/tensorflow/python/training/input.py
@@ -879,9 +879,6 @@ def batch(tensors, batch_size, num_threads=1, capacity=32,
   `get_shape` method will have a first `Dimension` value of `None`, and
   operations that depend on fixed batch_size would fail.
 
-  Note: if `num_epochs` is not `None`, this function creates local counter
-  `epochs`. Use `local_variables_initializer()` to initialize local variables.
-
   Args:
     tensors: The list or dictionary of tensors to enqueue.
     batch_size: The new batch size pulled from the queue.
@@ -1181,9 +1178,6 @@ def shuffle_batch(tensors, batch_size, capacity, min_after_dequeue,
   `get_shape` method will have a first `Dimension` value of `None`, and
   operations that depend on fixed batch_size would fail.
 
-  Note: if `num_epochs` is not `None`, this function creates local counter
-  `epochs`. Use `local_variables_initializer()` to initialize local variables.
-
   Args:
     tensors: The list or dictionary of tensors to enqueue.
     batch_size: The new batch size pulled from the queue.
diff --git a/tensorflow/python/training/monitored_session.py b/tensorflow/python/training/monitored_session.py
index fcec3ed97c..a891bae5f2 100644
--- a/tensorflow/python/training/monitored_session.py
+++ b/tensorflow/python/training/monitored_session.py
@@ -559,7 +559,7 @@ class MonitoredSession(_MonitoredSession):
 
   ```python
   saver_hook = CheckpointSaverHook(...)
-  summary_hook = SummaryHook(...)
+  summary_hook = SummarySaverHook(...)
   with MonitoredSession(session_creator=ChiefSessionCreator(...),
                         hooks=[saver_hook, summary_hook]) as sess:
     while not sess.should_stop():
@@ -648,7 +648,7 @@ class SingularMonitoredSession(_MonitoredSession):
   Example usage:
   ```python
   saver_hook = CheckpointSaverHook(...)
-  summary_hook = SummaryHook(...)
+  summary_hook = SummarySaverHook(...)
   with SingularMonitoredSession(hooks=[saver_hook, summary_hook]) as sess:
     while not sess.should_stop():
       sess.run(train_op)
diff --git a/tensorflow/python/training/saver.py b/tensorflow/python/training/saver.py
index 43b6174246..a65ab79495 100644
--- a/tensorflow/python/training/saver.py
+++ b/tensorflow/python/training/saver.py
@@ -935,11 +935,11 @@ def get_checkpoint_state(checkpoint_dir, latest_filename=None):
           ckpt.all_model_checkpoint_paths[i] = os.path.join(checkpoint_dir, p)
   except errors.OpError as e:
     # It's ok if the file cannot be read
-    logging.warning(str(e))
+    logging.warning("%s: %s", type(e).__name__, e)
     logging.warning("%s: Checkpoint ignored", coord_checkpoint_filename)
     return None
   except text_format.ParseError as e:
-    logging.warning(str(e))
+    logging.warning("%s: %s", type(e).__name__, e)
     logging.warning("%s: Checkpoint ignored", coord_checkpoint_filename)
     return None
   finally:
diff --git a/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc b/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc
index 1bb90afd63..c1e72bb565 100644
--- a/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc
+++ b/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc
@@ -69,14 +69,6 @@ bool FLAGS_prefer_cubin_to_ptx = true;
 
 namespace perftools {
 namespace gputools {
-namespace rng {
-class RngSupport;
-}  // namespace rng
-}  // namespace gputools
-}  // namespace perftools
-
-namespace perftools {
-namespace gputools {
 namespace cuda {
 
 // Hook that can be used to CUBIN-ate PTX before it is loaded into the driver.
diff --git a/tensorflow/stream_executor/cuda/cuda_gpu_executor.h b/tensorflow/stream_executor/cuda/cuda_gpu_executor.h
index 9d386b5ed9..6c5b9dca90 100644
--- a/tensorflow/stream_executor/cuda/cuda_gpu_executor.h
+++ b/tensorflow/stream_executor/cuda/cuda_gpu_executor.h
@@ -37,17 +37,6 @@ limitations under the License.
 
 namespace perftools {
 namespace gputools {
-namespace blas {
-class BlasSupport;
-}
-namespace internal {
-class RngSupport;
-}  // namespace internal
-}  // namespace gputools
-}  // namespace perftools
-
-namespace perftools {
-namespace gputools {
 namespace cuda {
 
 // CUDA-platform implementation of the platform-agnostic
diff --git a/tensorflow/tensorboard/backend/application_test.py b/tensorflow/tensorboard/backend/application_test.py
index a5181401fa..4ea627def7 100644
--- a/tensorflow/tensorboard/backend/application_test.py
+++ b/tensorflow/tensorboard/backend/application_test.py
@@ -227,6 +227,19 @@ class TensorboardServerTest(test.TestCase):
       response.read()
       connection.close()
 
+  def testScalars(self):
+    """Test the format of /data/scalars."""
+    data = self._getJson('/data/scalars?run=run1&tag=simple_values')
+    self.assertEqual(len(data), self._SCALAR_COUNT)
+
+  def testScalarsCsv(self):
+    """Test the csv format of /data/scalars."""
+    data = self._get(
+        '/data/scalars?run=run1&tag=simple_values&format=csv').read()
+    line_count = data.count('\n')
+    self.assertEqual(line_count,
+                     self._SCALAR_COUNT + 1)  # include 1 more line for header
+
   def testHistograms(self):
     """Test the format of /data/histograms."""
     self.assertEqual(
diff --git a/tensorflow/tensorboard/backend/event_processing/event_accumulator_test.py b/tensorflow/tensorboard/backend/event_processing/event_accumulator_test.py
index 3734e470b6..f7d424cb91 100644
--- a/tensorflow/tensorboard/backend/event_processing/event_accumulator_test.py
+++ b/tensorflow/tensorboard/backend/event_processing/event_accumulator_test.py
@@ -225,6 +225,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
     self.assertTagsEqual(x.Tags(), {})
 
   def testTags(self):
+    """Tags should be found in EventAccumulator after adding some events."""
     gen = _EventGenerator(self)
     gen.AddScalar('s1')
     gen.AddScalar('s2')
@@ -245,6 +246,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
     })
 
   def testReload(self):
+    """EventAccumulator contains suitable tags after calling Reload."""
     gen = _EventGenerator(self)
     acc = ea.EventAccumulator(gen)
     acc.Reload()
@@ -267,6 +269,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
     })
 
   def testScalars(self):
+    """Tests whether EventAccumulator contains scalars after adding them."""
     gen = _EventGenerator(self)
     acc = ea.EventAccumulator(gen)
     s1 = ea.ScalarEvent(wall_time=1, step=10, value=32)
@@ -293,6 +296,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
       self.assertEqual(expected_value, gotten_event.value[i])
 
   def testHealthPills(self):
+    """HealthPills should be properly inserted into EventAccumulator."""
     gen = _EventGenerator(self)
     acc = ea.EventAccumulator(gen)
     gen.AddHealthPill(13371337, 41, 'Add', 0, range(1, 13))
@@ -328,6 +332,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
     self.assertItemsEqual(['Add', 'MatMul'], acc.GetOpsWithHealthPills())
 
   def testHistograms(self):
+    """Tests whether histograms are inserted into EventAccumulator."""
     gen = _EventGenerator(self)
     acc = ea.EventAccumulator(gen)
 
@@ -377,6 +382,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
     self.assertEqual(acc.Histograms('hst2'), [hst2])
 
   def testCompressedHistograms(self):
+    """Tests compressed histograms inserted into EventAccumulator."""
     gen = _EventGenerator(self)
     acc = ea.EventAccumulator(gen, compression_bps=(0, 2500, 5000, 7500, 10000))
 
@@ -428,6 +434,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
     self.assertEqual(acc.CompressedHistograms('hst2'), [expected_cmphst2])
 
   def testCompressedHistogramsWithEmptyHistogram(self):
+    """Tests that empty histograms compressed properly in EventAccumulator."""
     gen = _EventGenerator(self)
     acc = ea.EventAccumulator(gen, compression_bps=(0, 2500, 5000, 7500, 10000))
 
@@ -481,6 +488,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
     self.assertAlmostEqual(vals[8].value, 1.0)
 
   def testImages(self):
+    """Tests 2 images inserted/accessed in EventAccumulator."""
     gen = _EventGenerator(self)
     acc = ea.EventAccumulator(gen)
     im1 = ea.ImageEvent(
@@ -514,6 +522,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
     self.assertEqual(acc.Images('im2'), [im2])
 
   def testAudio(self):
+    """Tests 2 audio events inserted/accessed in EventAccumulator."""
     gen = _EventGenerator(self)
     acc = ea.EventAccumulator(gen)
     snd1 = ea.AudioEvent(
@@ -551,6 +560,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
     self.assertEqual(acc.Audio('snd2'), [snd2])
 
   def testKeyError(self):
+    """KeyError should be raised when accessing non-existing keys."""
     gen = _EventGenerator(self)
     acc = ea.EventAccumulator(gen)
     acc.Reload()
@@ -574,7 +584,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
       acc.Audio('hst1')
 
   def testNonValueEvents(self):
-    """Tests that non-value events in the generator don't cause early exits."""
+    """Non-value events in the generator don't cause early exits."""
     gen = _EventGenerator(self)
     acc = ea.EventAccumulator(gen)
     gen.AddScalar('s1', wall_time=1, step=10, value=20)
diff --git a/tensorflow/tensorboard/dist/tf-tensorboard.html b/tensorflow/tensorboard/dist/tf-tensorboard.html
index 8610940ac3..9f13cdfb47 100644
--- a/tensorflow/tensorboard/dist/tf-tensorboard.html
+++ b/tensorflow/tensorboard/dist/tf-tensorboard.html
@@ -20112,7 +20112,7 @@ var TF;
           new TF.Dashboard.TfGraphDashboard(backend, debuggerDataEnabled),
           new TF.Dashboard.TfDistributionDashboard(backend),
           new TF.Dashboard.TfHistogramDashboard(backend),
-          new TF.Dashboard.VzProjectorDashboard('/data/plugin/projector'),
+          new TF.Dashboard.VzProjectorDashboard('data/plugin/projector'),
           new TF.Dashboard.TfTextDashboard(backend),
         ];
       },
diff --git a/tensorflow/tensorflow.bzl b/tensorflow/tensorflow.bzl
index ddffabd8cb..7baddf301c 100644
--- a/tensorflow/tensorflow.bzl
+++ b/tensorflow/tensorflow.bzl
@@ -1185,7 +1185,7 @@ def tf_version_info_genrule():
       ],
       outs=["util/version_info.cc"],
       cmd=
-      "$(location //tensorflow/tools/git:gen_git_source.py) --generate $(SRCS) \"$@\"",
+      "$(PYTHON_BIN_PATH) $(location //tensorflow/tools/git:gen_git_source.py) --generate $(SRCS) \"$@\"",
       local=1,
       tools=[clean_dep("//tensorflow/tools/git:gen_git_source.py")],)
 
diff --git a/tensorflow/tools/api/golden/tensorflow.layers.pbtxt b/tensorflow/tools/api/golden/tensorflow.layers.pbtxt
index 6ca38e259b..78b10c44a2 100644
--- a/tensorflow/tools/api/golden/tensorflow.layers.pbtxt
+++ b/tensorflow/tools/api/golden/tensorflow.layers.pbtxt
@@ -33,6 +33,10 @@ tf_module {
     argspec: "args=[\'inputs\', \'filters\', \'kernel_size\', \'strides\', \'padding\', \'data_format\', \'dilation_rate\', \'activation\', \'use_bias\', \'kernel_initializer\', \'bias_initializer\', \'kernel_regularizer\', \'bias_regularizer\', \'activity_regularizer\', \'trainable\', \'name\', \'reuse\'], varargs=None, keywords=None, defaults=[\'(1, 1, 1)\', \'valid\', \'channels_last\', \'(1, 1, 1)\', \'None\', \'True\', \'None\', \'<tensorflow.python.ops.init_ops.Zeros object instance>\', \'None\', \'None\', \'None\', \'True\', \'None\', \'None\'], "
   }
   member_method {
+    name: "conv3d_transpose"
+    argspec: "args=[\'inputs\', \'filters\', \'kernel_size\', \'strides\', \'padding\', \'data_format\', \'activation\', \'use_bias\', \'kernel_initializer\', \'bias_initializer\', \'kernel_regularizer\', \'bias_regularizer\', \'activity_regularizer\', \'trainable\', \'name\', \'reuse\'], varargs=None, keywords=None, defaults=[\'(1, 1, 1)\', \'valid\', \'channels_last\', \'None\', \'True\', \'None\', \'<tensorflow.python.ops.init_ops.Zeros object instance>\', \'None\', \'None\', \'None\', \'True\', \'None\', \'None\'], "
+  }
+  member_method {
     name: "dense"
     argspec: "args=[\'inputs\', \'units\', \'activation\', \'use_bias\', \'kernel_initializer\', \'bias_initializer\', \'kernel_regularizer\', \'bias_regularizer\', \'activity_regularizer\', \'trainable\', \'name\', \'reuse\'], varargs=None, keywords=None, defaults=[\'None\', \'True\', \'None\', \'<tensorflow.python.ops.init_ops.Zeros object instance>\', \'None\', \'None\', \'None\', \'True\', \'None\', \'None\'], "
   }
diff --git a/tensorflow/tools/api/golden/tensorflow.nn.pbtxt b/tensorflow/tools/api/golden/tensorflow.nn.pbtxt
index 192ceac2dd..3a448798b2 100644
--- a/tensorflow/tools/api/golden/tensorflow.nn.pbtxt
+++ b/tensorflow/tools/api/golden/tensorflow.nn.pbtxt
@@ -70,7 +70,7 @@ tf_module {
   }
   member_method {
     name: "conv3d_transpose"
-    argspec: "args=[\'value\', \'filter\', \'output_shape\', \'strides\', \'padding\', \'data_format\', \'name\'], varargs=None, keywords=None, defaults=[\'SAME\', \'None\', \'None\'], "
+    argspec: "args=[\'value\', \'filter\', \'output_shape\', \'strides\', \'padding\', \'data_format\', \'name\'], varargs=None, keywords=None, defaults=[\'SAME\', \'NDHWC\', \'None\'], "
   }
   member_method {
     name: "convolution"
diff --git a/tensorflow/tools/api/golden/tensorflow.pbtxt b/tensorflow/tools/api/golden/tensorflow.pbtxt
index fb636d9525..046d82c2d5 100644
--- a/tensorflow/tools/api/golden/tensorflow.pbtxt
+++ b/tensorflow/tools/api/golden/tensorflow.pbtxt
@@ -637,6 +637,10 @@ tf_module {
     argspec: "args=[\'x\', \'name\'], varargs=None, keywords=None, defaults=[\'None\'], "
   }
   member_method {
+    name: "atan2"
+    argspec: "args=[\'y\', \'x\', \'name\'], varargs=None, keywords=None, defaults=[\'None\'], "
+  }
+  member_method {
     name: "batch_to_space"
     argspec: "args=[\'input\', \'crops\', \'block_size\', \'name\'], varargs=None, keywords=None, defaults=[\'None\'], "
   }
diff --git a/tensorflow/tools/ci_build/update_version.sh b/tensorflow/tools/ci_build/update_version.sh
index e0ca8c3259..682f5329f5 100755
--- a/tensorflow/tools/ci_build/update_version.sh
+++ b/tensorflow/tools/ci_build/update_version.sh
@@ -73,8 +73,11 @@ OLD_MAJOR=$(cat ${VERSION_H} | grep -E "^#define TF_MAJOR_VERSION [0-9]+" | \
 cut -d ' ' -f 3)
 OLD_MINOR=$(cat ${VERSION_H} | grep -E "^#define TF_MINOR_VERSION [0-9]+" | \
 cut -d ' ' -f 3)
-OLD_PATCH=$(cat ${VERSION_H} | grep -E "^#define TF_PATCH_VERSION [[:alnum:]-]+" | \
+OLD_PATCH_NUM=$(cat ${VERSION_H} | grep -E "^#define TF_PATCH_VERSION [[:alnum:]-]+" | \
 cut -d ' ' -f 3)
+OLD_EXTENSION=$(cat ${VERSION_H} | grep -E "^#define TF_VERSION_SUFFIX \"[[:alnum:]-]+\"" | \
+cut -d ' ' -f 3)
+OLD_PATCH="$OLD_PATCH_NUM${OLD_EXTENSION//\"}"
 OLD_PIP_PATCH="${OLD_PATCH//-}"
 
 sed -i -e "s/^#define TF_MAJOR_VERSION ${OLD_MAJOR}/#define TF_MAJOR_VERSION ${MAJOR}/g" ${VERSION_H}
diff --git a/tensorflow/tools/common/public_api.py b/tensorflow/tools/common/public_api.py
index 837f11f690..cab3b2ff6a 100644
--- a/tensorflow/tools/common/public_api.py
+++ b/tensorflow/tools/common/public_api.py
@@ -38,7 +38,7 @@ class PublicAPIVisitor(object):
     self._visitor = visitor
 
     # Modules/classes we do not want to descend into if we hit them. Usually,
-    # sytem modules exposed through platforms for compatibility reasons.
+    # system modules exposed through platforms for compatibility reasons.
     # Each entry maps a module path to a name to ignore in traversal.
     self._do_not_descend_map = {
         '': [
diff --git a/tensorflow/tools/compatibility/tf_upgrade.py b/tensorflow/tools/compatibility/tf_upgrade.py
index 80439f835a..9a4a8ff71d 100644
--- a/tensorflow/tools/compatibility/tf_upgrade.py
+++ b/tensorflow/tools/compatibility/tf_upgrade.py
@@ -34,6 +34,10 @@ class APIChangeSpec(object):
     # Maps from a function name to a dictionary that describes how to
     # map from an old argument keyword to the new argument keyword.
     self.function_keyword_renames = {
+        "tf.batch_matmul": {
+            "adj_x": "adjoint_a",
+            "adj_y": "adjoint_b",
+        },
         "tf.count_nonzero": {
             "reduction_indices": "axis"
         },
diff --git a/tensorflow/tools/docker/Dockerfile.devel b/tensorflow/tools/docker/Dockerfile.devel
index bfac54c601..a0e0f88d9d 100644
--- a/tensorflow/tools/docker/Dockerfile.devel
+++ b/tensorflow/tools/docker/Dockerfile.devel
@@ -92,7 +92,8 @@ WORKDIR /tensorflow
 ENV CI_BUILD_PYTHON python
 
 RUN tensorflow/tools/ci_build/builds/configured CPU \
-    bazel build -c opt tensorflow/tools/pip_package:build_pip_package && \
+    bazel build -c opt --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" \
+        tensorflow/tools/pip_package:build_pip_package && \
     bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/pip && \
     pip --no-cache-dir install --upgrade /tmp/pip/tensorflow-*.whl && \
     rm -rf /tmp/pip && \
diff --git a/tensorflow/tools/docker/Dockerfile.devel-gpu b/tensorflow/tools/docker/Dockerfile.devel-gpu
index 7726cbdfbf..743c05ef88 100644
--- a/tensorflow/tools/docker/Dockerfile.devel-gpu
+++ b/tensorflow/tools/docker/Dockerfile.devel-gpu
@@ -92,7 +92,8 @@ ENV TF_NEED_CUDA 1
 ENV TF_CUDA_COMPUTE_CAPABILITIES=3.0,3.5,5.2,6.0,6.1
 
 RUN tensorflow/tools/ci_build/builds/configured GPU \
-    bazel build -c opt --config=cuda tensorflow/tools/pip_package:build_pip_package && \
+    bazel build -c opt --config=cuda --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" \
+        tensorflow/tools/pip_package:build_pip_package && \
     bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/pip && \
     pip --no-cache-dir install --upgrade /tmp/pip/tensorflow-*.whl && \
     rm -rf /tmp/pip && \
diff --git a/tensorflow/tools/docs/doc_generator_visitor.py b/tensorflow/tools/docs/doc_generator_visitor.py
index 119305bece..8f7b91fa75 100644
--- a/tensorflow/tools/docs/doc_generator_visitor.py
+++ b/tensorflow/tools/docs/doc_generator_visitor.py
@@ -170,7 +170,7 @@ class DocGeneratorVisitor(object):
     master names to a lexicographically sorted list of all aliases for that name
     (incl. the master name).
 
-    All these are computed and set as fields if they haven't aready.
+    All these are computed and set as fields if they haven't already.
     """
     if self._reverse_index is not None:
       return
diff --git a/tensorflow/tools/docs/pretty_docs.py b/tensorflow/tools/docs/pretty_docs.py
index 824f46170e..657d6bf849 100644
--- a/tensorflow/tools/docs/pretty_docs.py
+++ b/tensorflow/tools/docs/pretty_docs.py
@@ -230,7 +230,7 @@ def _build_signature(obj_info):
 
 
 def _build_compatibility(compatibility):
-  """Return the compatability section as an md string."""
+  """Return the compatibility section as an md string."""
   parts = []
   sorted_keys = sorted(compatibility.keys())
   for key in sorted_keys:
diff --git a/tensorflow/tools/graph_transforms/README.md b/tensorflow/tools/graph_transforms/README.md
index c9f0a9b222..9bb438fa71 100644
--- a/tensorflow/tools/graph_transforms/README.md
+++ b/tensorflow/tools/graph_transforms/README.md
@@ -81,10 +81,10 @@ bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
 --out_graph=optimized_inception_graph.pb \
 --inputs='Mul:0' \
 --outputs='softmax:0' \
---transforms='\
-strip_unused_nodes(type=float, shape="1,299,299,3") \
-remove_nodes(op=Identity, op=CheckNumerics) \
-fold_old_batch_norms \
+--transforms='
+strip_unused_nodes(type=float, shape="1,299,299,3")
+remove_nodes(op=Identity, op=CheckNumerics)
+fold_old_batch_norms
 '
 ```
 
@@ -94,7 +94,10 @@ transforms to modify the graph with. The transforms are given as a list of
 names, and can each have arguments themselves. These transforms define the
 pipeline of modifications that are applied in order to produce the output.
 Sometimes you need some transforms to happen before others, and the ordering
-within the list lets you specify which happen first.
+within the list lets you specify which happen first. 
+Note that the optimization 
+`remove_nodes(op=Identity, op=CheckNumerics)` will break the model with control 
+flow operations, such as `tf.cond`, `tf.map_fn`, and `tf.while`.
 
 ## Inspecting Graphs
 
@@ -169,7 +172,7 @@ then you'll need to make local modifications to the build files to include the
 right .cc file that defines it. In a lot of cases the op is just a vestigial
 remnant from the training process though, and if that's true then you can run
 the [strip_unused_nodes](#strip_unused_nodes), specifying the inputs and outputs
-of your inference usage, to remove those unneccessary nodes:
+of your inference usage, to remove those unnecessary nodes:
 
 ```bash
 bazel build tensorflow/tools/graph_transforms:transform_graph
diff --git a/tensorflow/tools/pip_package/BUILD b/tensorflow/tools/pip_package/BUILD
index c17a7f7fb1..83be430e7d 100644
--- a/tensorflow/tools/pip_package/BUILD
+++ b/tensorflow/tools/pip_package/BUILD
@@ -68,6 +68,7 @@ py_binary(
         ":included_headers",
         "//tensorflow/contrib/nn:nn_py",
         "//tensorflow/contrib/session_bundle:session_bundle_pip",
+        "//tensorflow/contrib/signal:signal_py",
         "//tensorflow/contrib/slim/python/slim/data:data_pip",
         "//tensorflow/python:util_example_parser_configuration",
         "//tensorflow/python/debug:debug_pip",
@@ -141,6 +142,7 @@ sh_binary(
             "//tensorflow/contrib/ndlstm:ndlstm",
             "//tensorflow/contrib/nn:nn_py",
             "//tensorflow/contrib/session_bundle:session_bundle_pip",
+            "//tensorflow/contrib/signal:signal_py",
             "//tensorflow/contrib/slim:slim",
             "//tensorflow/contrib/slim/python/slim/data:data_pip",
             "//tensorflow/contrib/slim/python/slim/nets:nets_pip",
diff --git a/tensorflow/tools/pip_package/setup.py b/tensorflow/tools/pip_package/setup.py
index 3ee99d5d31..f8982073c3 100644
--- a/tensorflow/tools/pip_package/setup.py
+++ b/tensorflow/tools/pip_package/setup.py
@@ -29,7 +29,7 @@ from setuptools.dist import Distribution
 # This version string is semver compatible, but incompatible with pip.
 # For pip, we will remove all '-' characters from this string, and use the
 # result for pip.
-_VERSION = '1.1.0-rc2'
+_VERSION = '1.1.0'
 
 REQUIRED_PACKAGES = [
     'numpy >= 1.11.0',
diff --git a/tensorflow/tools/quantization/quantize_graph.py b/tensorflow/tools/quantization/quantize_graph.py
index d09349a79b..a0cfc352d4 100644
--- a/tensorflow/tools/quantization/quantize_graph.py
+++ b/tensorflow/tools/quantization/quantize_graph.py
@@ -453,7 +453,8 @@ class GraphRewriter(object):
 
   def round_nodes_recursively(self, current_node):
     """The entry point for simple rounding quantization."""
-    if self.already_visited[current_node.name]:
+    if (current_node.name in self.already_visited
+       ) and self.already_visited[current_node.name]:
       return
     self.already_visited[current_node.name] = True
     for input_node_name in current_node.input:
diff --git a/tensorflow/tools/tfprof/README.md b/tensorflow/tools/tfprof/README.md
index c83cdd45b6..540e179aae 100644
--- a/tensorflow/tools/tfprof/README.md
+++ b/tensorflow/tools/tfprof/README.md
@@ -30,7 +30,7 @@ statistics.
 
 tfprof is part of TensorFlow core. Simply ```import tensorflow as tf```.
 
-### Examine the shapes and sizes of all trainiable Variables.
+### Examine the shapes and sizes of all trainable Variables.
 ```python
 # Print trainable variable parameter statistics to stdout.
 param_stats = tf.contrib.tfprof.model_analyzer.print_model_analysis(
@@ -439,7 +439,7 @@ with gfile.Open(os.path.join(output_dir, "run_meta"), "w") as f:
 <b>--op_log_path:</b>
 tensorflow::tfprof::OpLog. A proto used to provide extra op information
 for ops. By giving a group of ops a type name, users can easily aggregate the
-statistics for those ops without accidently missing or including extra ops.
+statistics for those ops without accidentally missing or including extra ops.
 tfprof exposes the following Python API to add op information and logging.
 
 ```python
diff --git a/third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl b/third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl
index cab533b9e7..242439daf4 100755
--- a/third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl
+++ b/third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl
@@ -227,7 +227,7 @@ def InvokeNvcc(argv, log=False):
 
   # TODO(zhengxq): for some reason, 'gcc' needs this help to find 'as'.
   # Need to investigate and fix.
-  cmd = 'PATH=' + PREFIX_DIR + ' ' + cmd
+  cmd = 'PATH=' + PREFIX_DIR + ':$PATH ' + cmd
   if log: Log(cmd)
   return os.system(cmd)
author	Dan Ringwalt <ringwalt@google.com>	2017-05-05 09:09:05 -0800
committer	TensorFlower Gardener <gardener@tensorflow.org>	2017-05-05 10:26:00 -0700
commit	692fad20f913ffa2cb874a87578ecabb03cc4557 (patch)
tree	172717f537c91b0d1ac0366731b4eb2093fb743b
parent	b329dd821e29e64c93b1b9bf38e61871c6cb53da (diff)