From 22586bdf900640217deac6dc826054bc6e785518 Mon Sep 17 00:00:00 2001
From: Vijay Vasudevan <vrv@google.com>
Date: Wed, 3 May 2017 12:25:10 -0700
Subject: Branch 154885009 (#9604)

* Enable grappler to propagate shapes through queues.
Change: 154789133

* Add whitelist support in uid of RunConfig.
Change: 154794859

* Fix a bunch of bad links and missing docs in contrib.
Change: 154820641

* Don't try to refine the shapes for a node if its inference context wasn't
successfully built by the AddNode() method.
Change: 154838211

* Fix issue related to empty bazel.rc file.
Change: 154840138

* Remove overly precise CHECK when rendering debug output for a function.

An `_Arg` node can have more than three attrs, because the runtime may
(and does) add system-defined attrs (viz. "_output_shapes") that do
not change the meaning of the op.
Change: 154850526

* Port makefile build breakage
Change: 154855106

* [TF:XLA] Try to incorporate Tensorflow node structure for large HLO GraphDefs.

This change assumes that a TF subgraph/op does not cross the boundary of a HLO
computation and always put top-level TF subgraphs/ops under HLO computations.
Change: 154855884

* Added a unit test to check what happens when 2 shapes with known rank but
unknown dimensions are merged
Change: 154856675

* [XLA] Refactor constant folding operations into a dedicated module

Refactor constant folding operations into a dedicated module, and added a new
ReplaceInstruction() API to collapse { computation->ReplaceInstruction();
changed=true}.
Change: 154857025

* Java: Docs: Update instructions for Windows.

Inspired by
http://stackoverflow.com/questions/43741775/tensorflow-in-java-running-failed
Change: 154859066

* Add more documentation for features and labels.
Change: 154859649

* Added link to high-performance models
Change: 154860213

* Navigation and index for new performance section documents.
Change: 154862215

* Fix shape mismatch between loss and weights.
Change: 154862650

* Add examples to TensorShape documentation and ran autoformatter.
Change: 154862667

* Move linking of cudnn_plugin, cublas_plugin and cufft_plugin from
stream_executor to the ops that need them.
Change: 154863520

* Properly track the persistent memory usage of lookup tables.
Change: 154866686

* Reset the inputs to ShapeRefiner::RunShapeFn so that it behaves the same every time it's called.
To properly handle queues that have populated by several enqueue ops, merge the shapes of the inputs to all the enqueue ops before calling InferenceContext::set_output_handle_shape(). This ensures that we detect incorrect queue setups (where the 2 enqueue ops might generate tensors with incompatible shapes), and that we take all the known shape information instead of that of just one of the enqueue ops.
Change: 154866747

* Making sure an error message will be produced by session_manager when a non-tensor object is passed in.
Otherwise the 'name' property is missing.
Change: 154868022

* Don't needlessly synchronize the CUDA stream in CropAndResize.
Make the op Async so we don't block an executor thread while waiting for the result of the box bounds check to be copied back to the host.
Change: 154868460

* Add contribution guidelines and standards section to CONTRIBUTING.md

Several parts are largely based on the post by @yaroslavvb at: #7443#issuecomment-279182613

Fixes #7443
Change: 154876045

* Final draft
Change: 154876563

* Final draft
Change: 154876646

* Fix losses documentation.

Fix documentation of get_total_loss() to be correct.
And add a helpful comment about a common pitfall.
Change: 154876822

* [XLA] Second change for HLO interpreter.

Extends HloEvaluator to allow evaluation of HLO Computation or single HLO instruction
with non-constant operands, by traversing the instruction in post order and keeps track of
each instruction along the way as evaluated literals.
Change: 154877580

* [tf distributions] Move the remaining whitelisted distributions to core.
Change: 154878206

* Add shape to error message.
Change: 154880260

* Revert "Fix build issue when `/usr/bin/python` path is not available (#9547)"

This reverts commit 95f37ebf0bd46c328266f65bbd16d319c0efab3d.
---
 .gitignore                                         |   1 -
 CONTRIBUTING.md                                    | 137 +++++
 configure                                          |   3 +-
 tensorflow/compiler/xla/service/BUILD              |  23 +
 .../compiler/xla/service/algebraic_simplifier.cc   | 186 +------
 .../xla/service/algebraic_simplifier_test.cc       | 133 -----
 tensorflow/compiler/xla/service/gpu/BUILD          |   2 +
 .../compiler/xla/service/hlo_constant_folding.cc   | 254 +++++++--
 .../compiler/xla/service/hlo_constant_folding.h    |   6 +-
 .../xla/service/hlo_constant_folding_test.cc       | 169 ++++++
 tensorflow/compiler/xla/service/hlo_evaluator.cc   | 254 +++++----
 tensorflow/compiler/xla/service/hlo_evaluator.h    |  81 ++-
 .../compiler/xla/service/hlo_evaluator_test.cc     |  65 ++-
 tensorflow/compiler/xla/service/hlo_query.cc       |  10 +
 tensorflow/compiler/xla/service/hlo_query.h        |   4 +
 .../compiler/xla/service/hlo_tfgraph_builder.cc    |  14 +-
 .../xla/service/hlo_tfgraph_builder_test.cc        |  22 +
 tensorflow/contrib/distributions/BUILD             | 174 -------
 tensorflow/contrib/distributions/__init__.py       | 155 +++---
 .../python/kernel_tests/bernoulli_test.py          | 300 -----------
 .../distributions/python/kernel_tests/beta_test.py | 363 -------------
 .../bijectors/cholesky_outer_product_test.py       |   2 +-
 .../python/kernel_tests/bijectors/invert_test.py   |   2 +-
 .../python/kernel_tests/categorical_test.py        | 297 -----------
 .../kernel_tests/dirichlet_multinomial_test.py     | 479 -----------------
 .../python/kernel_tests/dirichlet_test.py          | 240 ---------
 .../python/kernel_tests/exponential_test.py        | 140 -----
 .../python/kernel_tests/gamma_test.py              | 366 -------------
 .../python/kernel_tests/laplace_test.py            | 318 ------------
 .../python/kernel_tests/multinomial_test.py        | 341 -------------
 .../python/kernel_tests/student_t_test.py          | 475 -----------------
 .../python/kernel_tests/uniform_test.py            | 265 ----------
 .../contrib/distributions/python/ops/bernoulli.py  | 215 --------
 .../contrib/distributions/python/ops/beta.py       | 366 -------------
 .../python/ops/bijectors/bijector_test_util.py     |   2 +-
 .../distributions/python/ops/categorical.py        | 242 ---------
 .../contrib/distributions/python/ops/chi2.py       |   2 +-
 .../contrib/distributions/python/ops/dirichlet.py  | 297 -----------
 .../python/ops/dirichlet_multinomial.py            | 343 -------------
 .../distributions/python/ops/exponential.py        | 151 ------
 .../contrib/distributions/python/ops/gamma.py      | 305 -----------
 .../contrib/distributions/python/ops/laplace.py    | 226 ---------
 .../contrib/distributions/python/ops/mixture.py    |   2 +-
 .../distributions/python/ops/multinomial.py        | 291 -----------
 .../contrib/distributions/python/ops/student_t.py  | 360 -------------
 .../contrib/distributions/python/ops/uniform.py    | 202 --------
 .../distributions/python/ops/vector_student_t.py   |   2 +-
 .../contrib/learn/python/learn/estimators/head.py  |  11 +-
 .../learn/python/learn/estimators/head_test.py     |  30 +-
 .../learn/python/learn/estimators/run_config.py    |  26 +-
 .../python/learn/estimators/run_config_test.py     |  45 ++
 .../learn/python/learn/learn_runner_test.py        |   3 +-
 tensorflow/contrib/losses/__init__.py              |  24 +-
 .../contrib/losses/python/losses/__init__.py       | 120 +----
 .../hexagon_graph_execution/Makefile.in            |   1 -
 tensorflow/contrib/seq2seq/__init__.py             |  56 +-
 .../seq2seq/python/ops/attention_wrapper.py        |   1 +
 tensorflow/contrib/seq2seq/python/ops/helper.py    |   4 +-
 tensorflow/core/common_runtime/shape_refiner.cc    | 235 ++++++---
 tensorflow/core/common_runtime/shape_refiner.h     |   8 +
 .../core/common_runtime/shape_refiner_test.cc      |  33 ++
 tensorflow/core/framework/function.cc              |   2 +-
 tensorflow/core/framework/shape_inference.h        |  57 ++-
 tensorflow/core/framework/shape_inference_test.cc  |   5 +
 tensorflow/core/grappler/clusters/BUILD            |   1 +
 .../core/grappler/clusters/single_machine.cc       |  48 +-
 tensorflow/core/grappler/clusters/single_machine.h |   2 +
 .../core/grappler/clusters/single_machine_test.cc  | 115 +++++
 tensorflow/core/grappler/costs/BUILD               |   4 +-
 tensorflow/core/grappler/costs/graph_properties.cc |  73 +++
 .../core/grappler/costs/graph_properties_test.cc   |  98 ++++
 tensorflow/core/kernels/BUILD                      |   9 +-
 tensorflow/core/kernels/crop_and_resize_op.cc      | 565 ++++++++++++---------
 tensorflow/core/kernels/crop_and_resize_op.h       |   8 +-
 .../core/kernels/crop_and_resize_op_gpu.cu.cc      |   2 +-
 tensorflow/core/kernels/crop_and_resize_op_test.cc |   6 +-
 tensorflow/core/kernels/lookup_table_op.h          |  13 +-
 tensorflow/core/ops/data_flow_ops.cc               |  12 +-
 .../core/platform/default/build_config/BUILD       |  16 +
 .../api_guides/python/contrib.graph_editor.md      |  20 +-
 .../docs_src/api_guides/python/contrib.linalg.md   |   2 +-
 .../docs_src/api_guides/python/contrib.losses.md   |  15 +-
 tensorflow/docs_src/get_started/tflearn.md         |   2 +-
 tensorflow/docs_src/install/install_java.md        |  11 +-
 tensorflow/docs_src/performance/benchmarks.md      | 128 +++--
 tensorflow/docs_src/performance/index.md           |  12 +-
 tensorflow/docs_src/performance/leftnav_files      |   5 +-
 .../docs_src/performance/performance_guide.md      |   9 +-
 .../docs_src/performance/performance_models.md     | 484 ++++++++----------
 tensorflow/python/estimator/estimator.py           |  16 +-
 tensorflow/python/framework/tensor_shape.py        |  52 +-
 tensorflow/python/kernel_tests/distributions/BUILD | 174 +++++++
 .../kernel_tests/distributions/bernoulli_test.py   | 320 ++++++++++++
 .../python/kernel_tests/distributions/beta_test.py | 394 ++++++++++++++
 .../kernel_tests/distributions/categorical_test.py | 297 +++++++++++
 .../distributions/dirichlet_multinomial_test.py    | 480 +++++++++++++++++
 .../kernel_tests/distributions/dirichlet_test.py   | 263 ++++++++++
 .../kernel_tests/distributions/exponential_test.py | 171 +++++++
 .../kernel_tests/distributions/gamma_test.py       | 406 +++++++++++++++
 .../kernel_tests/distributions/laplace_test.py     | 362 +++++++++++++
 .../kernel_tests/distributions/multinomial_test.py | 343 +++++++++++++
 .../kernel_tests/distributions/student_t_test.py   | 516 +++++++++++++++++++
 .../kernel_tests/distributions/uniform_test.py     | 286 +++++++++++
 tensorflow/python/ops/distributions/BUILD          |   1 +
 tensorflow/python/ops/distributions/bernoulli.py   | 215 ++++++++
 tensorflow/python/ops/distributions/beta.py        | 366 +++++++++++++
 tensorflow/python/ops/distributions/categorical.py | 242 +++++++++
 .../ops/distributions/conditional_distribution.py  |   2 +-
 tensorflow/python/ops/distributions/dirichlet.py   | 297 +++++++++++
 .../ops/distributions/dirichlet_multinomial.py     | 343 +++++++++++++
 tensorflow/python/ops/distributions/exponential.py | 151 ++++++
 tensorflow/python/ops/distributions/gamma.py       | 305 +++++++++++
 tensorflow/python/ops/distributions/laplace.py     | 226 +++++++++
 tensorflow/python/ops/distributions/multinomial.py | 291 +++++++++++
 tensorflow/python/ops/distributions/normal.py      |   6 +-
 tensorflow/python/ops/distributions/student_t.py   | 362 +++++++++++++
 tensorflow/python/ops/distributions/uniform.py     | 202 ++++++++
 tensorflow/python/ops/losses/util.py               |   8 +-
 tensorflow/python/ops/weights_broadcast_ops.py     |   5 +-
 tensorflow/python/training/session_manager.py      |  24 +-
 tensorflow/python/training/session_manager_test.py |  17 +
 tensorflow/tensorflow.bzl                          |   2 +-
 tensorflow/tools/docs/generate_lib.py              |   1 -
 tools/bazel.rc                                     |  30 ++
 124 files changed, 9570 insertions(+), 8214 deletions(-)
 create mode 100644 tensorflow/compiler/xla/service/hlo_constant_folding_test.cc
 delete mode 100644 tensorflow/contrib/distributions/python/kernel_tests/bernoulli_test.py
 delete mode 100644 tensorflow/contrib/distributions/python/kernel_tests/beta_test.py
 delete mode 100644 tensorflow/contrib/distributions/python/kernel_tests/categorical_test.py
 delete mode 100644 tensorflow/contrib/distributions/python/kernel_tests/dirichlet_multinomial_test.py
 delete mode 100644 tensorflow/contrib/distributions/python/kernel_tests/dirichlet_test.py
 delete mode 100644 tensorflow/contrib/distributions/python/kernel_tests/exponential_test.py
 delete mode 100644 tensorflow/contrib/distributions/python/kernel_tests/gamma_test.py
 delete mode 100644 tensorflow/contrib/distributions/python/kernel_tests/laplace_test.py
 delete mode 100644 tensorflow/contrib/distributions/python/kernel_tests/multinomial_test.py
 delete mode 100644 tensorflow/contrib/distributions/python/kernel_tests/student_t_test.py
 delete mode 100644 tensorflow/contrib/distributions/python/kernel_tests/uniform_test.py
 delete mode 100644 tensorflow/contrib/distributions/python/ops/bernoulli.py
 delete mode 100644 tensorflow/contrib/distributions/python/ops/beta.py
 delete mode 100644 tensorflow/contrib/distributions/python/ops/categorical.py
 delete mode 100644 tensorflow/contrib/distributions/python/ops/dirichlet.py
 delete mode 100644 tensorflow/contrib/distributions/python/ops/dirichlet_multinomial.py
 delete mode 100644 tensorflow/contrib/distributions/python/ops/exponential.py
 delete mode 100644 tensorflow/contrib/distributions/python/ops/gamma.py
 delete mode 100644 tensorflow/contrib/distributions/python/ops/laplace.py
 delete mode 100644 tensorflow/contrib/distributions/python/ops/multinomial.py
 delete mode 100644 tensorflow/contrib/distributions/python/ops/student_t.py
 delete mode 100644 tensorflow/contrib/distributions/python/ops/uniform.py
 create mode 100644 tensorflow/python/kernel_tests/distributions/bernoulli_test.py
 create mode 100644 tensorflow/python/kernel_tests/distributions/beta_test.py
 create mode 100644 tensorflow/python/kernel_tests/distributions/categorical_test.py
 create mode 100644 tensorflow/python/kernel_tests/distributions/dirichlet_multinomial_test.py
 create mode 100644 tensorflow/python/kernel_tests/distributions/dirichlet_test.py
 create mode 100644 tensorflow/python/kernel_tests/distributions/exponential_test.py
 create mode 100644 tensorflow/python/kernel_tests/distributions/gamma_test.py
 create mode 100644 tensorflow/python/kernel_tests/distributions/laplace_test.py
 create mode 100644 tensorflow/python/kernel_tests/distributions/multinomial_test.py
 create mode 100644 tensorflow/python/kernel_tests/distributions/student_t_test.py
 create mode 100644 tensorflow/python/kernel_tests/distributions/uniform_test.py
 create mode 100644 tensorflow/python/ops/distributions/bernoulli.py
 create mode 100644 tensorflow/python/ops/distributions/beta.py
 create mode 100644 tensorflow/python/ops/distributions/categorical.py
 create mode 100644 tensorflow/python/ops/distributions/dirichlet.py
 create mode 100644 tensorflow/python/ops/distributions/dirichlet_multinomial.py
 create mode 100644 tensorflow/python/ops/distributions/exponential.py
 create mode 100644 tensorflow/python/ops/distributions/gamma.py
 create mode 100644 tensorflow/python/ops/distributions/laplace.py
 create mode 100644 tensorflow/python/ops/distributions/multinomial.py
 create mode 100644 tensorflow/python/ops/distributions/student_t.py
 create mode 100644 tensorflow/python/ops/distributions/uniform.py
 create mode 100644 tools/bazel.rc

diff --git a/.gitignore b/.gitignore
index 900e5a53cb..d8ecef1e1e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -5,7 +5,6 @@ node_modules
 /.tf_configure.bazelrc
 /bazel-*
 /third_party/py/numpy/numpy_include
-/tools/bazel.rc
 /tools/python_bin_path.sh
 /tools/git/gen
 /util/python/python_include
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 5ae5c0fbbc..c36ef1ecd3 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -27,3 +27,140 @@ contributions, often because we probably won't get to them right now. If you
 decide to start on an issue, leave a comment so that other people know that
 you're working on it. If you want to help out, but not alone, use the issue
 comment thread to coordinate.
+
+### Contribution guidelines and standards
+
+Before sending your pull request for
+[review](https://github.com/tensorflow/tensorflow/pulls),
+make sure your changes are consistent with the guidelines and follow the
+TensorFlow coding style.
+
+#### General guidelines and philosophy for contribution
+
+* Include unit tests when you contribute new features, as they help to
+  a) prove that your code works correctly, b) guard against future breaking
+  changes to lower the maintenance cost.
+* Bug fixes also generally require unit tests, because the presense of bugs
+  usually indicates insufficient test coverage.
+* Keep API compatibility in mind when you change code in core TensorFlow,
+  e.g., code in [tensorflow/core](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core) and  [tensorflow/python](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/python).
+  TensorFlow has reached version 1 and hence cannot make
+  non-backward-compatible API changes without a major release. Reviewers of your
+  pull request will comment on any API compatibility issues.
+* When you contribute a new feature to TensorFlow, the maintenance burden is (by
+  default) transferred to the TensorFlow team. This means that benefit of
+  contribution must be compared against the cost of maintaining the feature.
+* Full new features (e.g., a new op implementing a cutting-edge algorithm)
+  typically will live in
+  [tensorflow/contrib](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib)
+  to get some airtime before decision is made regarding whether they are to be
+  migrated to the core.
+
+#### License
+
+Include a license at the top of new files.
+
+* [C/C++ license example](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/op.cc#L1)
+* [Python license example](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn.py#L1)
+* [Java license example](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/java/src/main/java/org/tensorflow/Graph.java#L1)
+* [Go license example](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/go/operation.go#L1)
+* [Bash license example](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/ci_build/ci_sanity.sh#L2)
+* [HTML license example](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tensorboard/dist/index.html#L2)
+* [JavaScript/TypeScript license example](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tensorboard/components/tf_backend/backend.ts#L1)
+
+Bazel BUILD files also need to include a license section, e.g.,
+[BUILD example](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/BUILD#L61).
+
+#### C++ coding style
+
+Changes to TensorFlow C++ code should conform to
+[Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html).
+
+Use `clang-tidy` to check your C/C++ changes. To install clang-tidy on ubuntu:16.04, do:
+
+```bash
+apt-get install -y clang-tidy
+```
+
+You can check a C/C++ file by doing:
+
+
+```bash
+clang-format <my_cc_file> --style=google > /tmp/my_cc_file.cc
+diff <my_cc_file> /tmp/my_cc_file.cc
+```
+
+#### Python coding style
+
+Changes to TensorFlow Python code should conform to
+[Google Python Style Guide](https://google.github.io/styleguide/pyguide.html)
+
+Use `pylint` to check your Python changes. To install `pylint` and
+retrieve TensorFlow's custom style definition:
+
+```bash
+pip install pylint
+wget -O /tmp/pylintrc https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/tools/ci_build/pylintrc
+```
+
+To check a file with `pylint`:
+
+```bash
+pylint --rcfile=/tmp/pylintrc myfile.py
+```
+
+#### Coding style for other languages
+
+* [Google Java Style Guide](https://google.github.io/styleguide/javaguide.html)
+* [Google JavaScript Style Guide](https://google.github.io/styleguide/jsguide.html)
+* [Google Shell Style Guide](https://google.github.io/styleguide/shell.xml)
+
+#### Running sanity check
+
+If you have Docker installed on your system, you can perform a sanity check on
+your changes by running the command:
+
+```bash
+tensorflow/tools/ci_build/ci_build.sh CPU tensorflow/tools/ci_build/ci_sanity.sh
+```
+
+This will catch most license, Python coding style and BUILD file issues that
+may exist in your changes.
+
+#### Running unit tests
+
+There are two ways to run TensorFlow unit tests.
+
+1. Using tools and libraries installed directly on your system.
+
+   Refer to the
+   [CPU-only developer Dockerfile](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/Dockerfile.devel) and
+   [GPU developer Dockerfile](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/Dockerfile.devel-gpu)
+   for the required packages. Alternatively, use the said
+   [Docker images](https://hub.docker.com/r/tensorflow/tensorflow/tags/), e.g.,
+   `tensorflow/tensorflow:nightly-devel` and `tensorflow/tensorflow:nightly-devel-gpu`
+   for development to avoid installing the packages directly on your system.
+
+   Once you have the packages installed, you can run a specific unit test in
+   bazel by doing as follows:
+
+   If the tests are to be run on GPU, add CUDA paths to LD_LIBRARY_PATH and add
+   the `cuda` option flag
+
+   ```bash
+   export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH"
+
+   export flags="--config=opt --config=cuda -k"
+   ```
+
+   For example, to run all tests under tensorflow/python, do:
+
+   ```bash
+   bazel test ${flags} //tensorflow/python/...
+   ```
+
+2. Using Docker and TensorFlow's CI scripts.
+
+   See
+   [TensorFlow Builds](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/ci_build) for details.
+
diff --git a/configure b/configure
index 1844799225..e44939bfd6 100755
--- a/configure
+++ b/configure
@@ -353,9 +353,8 @@ if [[ "$TF_NEED_VERBS" == "1" ]]; then
 fi
 
 # Append CC optimization flags to bazel.rc
-echo >> tools/bazel.rc
 for opt in $CC_OPT_FLAGS; do
-  echo "build:opt --cxxopt=$opt --copt=$opt" >> tools/bazel.rc
+  write_to_bazelrc 'build:opt --cxxopt=$opt --copt=$opt'
 done
 
 # Run the gen_git_source to create links where bazel can track dependencies for
diff --git a/tensorflow/compiler/xla/service/BUILD b/tensorflow/compiler/xla/service/BUILD
index 05fc480936..bdb69b6e55 100644
--- a/tensorflow/compiler/xla/service/BUILD
+++ b/tensorflow/compiler/xla/service/BUILD
@@ -80,6 +80,8 @@ cc_library(
         ":hlo_query",
         "//tensorflow/compiler/xla:literal_util",
         "//tensorflow/compiler/xla:shape_util",
+        "//tensorflow/compiler/xla:status",
+        "//tensorflow/compiler/xla:status_macros",
         "//tensorflow/compiler/xla:statusor",
         "//tensorflow/compiler/xla:types",
         "//tensorflow/compiler/xla:util",
@@ -1418,6 +1420,27 @@ cc_library(
     ],
 )
 
+cc_test(
+    name = "hlo_constant_folding_test",
+    srcs = ["hlo_constant_folding_test.cc"],
+    deps = [
+        ":cpu_plugin",
+        ":hlo",
+        ":hlo_constant_folding",
+        ":hlo_matchers",
+        ":hlo_pass",
+        "//tensorflow/compiler/xla:literal_util",
+        "//tensorflow/compiler/xla:shape_util",
+        "//tensorflow/compiler/xla:test",
+        "//tensorflow/compiler/xla:types",
+        "//tensorflow/compiler/xla:util",
+        "//tensorflow/compiler/xla:xla_data_proto",
+        "//tensorflow/compiler/xla/tests:hlo_test_base",
+        "//tensorflow/core:lib",
+        "//tensorflow/core:test_main",
+    ],
+)
+
 cc_library(
     name = "device_memory_allocator",
     srcs = ["device_memory_allocator.cc"],
diff --git a/tensorflow/compiler/xla/service/algebraic_simplifier.cc b/tensorflow/compiler/xla/service/algebraic_simplifier.cc
index 124a949bac..d6dce9745b 100644
--- a/tensorflow/compiler/xla/service/algebraic_simplifier.cc
+++ b/tensorflow/compiler/xla/service/algebraic_simplifier.cc
@@ -219,12 +219,6 @@ class AlgebraicSimplifierVisitor : public DfsHloVisitorWithDefault {
                                    HloInstruction* operand, HloInstruction* max,
                                    HloInstruction* max_operand);
 
-  // Tries to constant fold a concatenate operation, and returns true if the
-  // operation has been performed. An error status is returned in case of error.
-  StatusOr<bool> TryConcatenateConstantFold(
-      HloInstruction* concatenate,
-      tensorflow::gtl::ArraySlice<HloInstruction*> operands);
-
   // A Reshape or Broadcast that feeds an element-wise operation with a unique
   // non-scalar operand can sink to after the operation.
   StatusOr<bool> TryToSinkReshapeOrBroadcastAfterOpWithUniqueNonScalarOperand(
@@ -236,12 +230,23 @@ class AlgebraicSimplifierVisitor : public DfsHloVisitorWithDefault {
   Status ReplaceWithNewInstruction(
       HloInstruction* old_instruction,
       std::unique_ptr<HloInstruction> new_instruction) {
-    TF_CHECK_OK(computation_->ReplaceWithNewInstruction(
+    TF_RETURN_IF_ERROR(computation_->ReplaceWithNewInstruction(
         old_instruction, std::move(new_instruction)));
     changed_ = true;
     return Status::OK();
   }
 
+  // Replaces the existing HLO instruction old_instruction, with
+  // new_instruction, and marks the optimizer status as changed.
+  // Returns the Status representing the result of the replace operation.
+  Status ReplaceInstruction(HloInstruction* old_instruction,
+                            HloInstruction* new_instruction) {
+    TF_RETURN_IF_ERROR(
+        computation_->ReplaceInstruction(old_instruction, new_instruction));
+    changed_ = true;
+    return Status::OK();
+  }
+
   // Current HloComputation instance the AlgebraicSimplifierVisitor is
   // traversing.
   HloComputation* computation_;
@@ -290,8 +295,7 @@ void AlgebraicSimplifierVisitor::ReplaceWithBitcast(
   auto bitcast = computation_->AddInstruction(
       HloInstruction::CreateUnary(instruction->shape(), HloOpcode::kBitcast,
                                   instruction->mutable_operand(0)));
-  TF_CHECK_OK(computation_->ReplaceInstruction(instruction, bitcast));
-  changed_ = true;
+  TF_CHECK_OK(ReplaceInstruction(instruction, bitcast));
 }
 
 bool AlgebraicSimplifierVisitor::ReplaceInstructionIfSameShape(
@@ -299,9 +303,7 @@ bool AlgebraicSimplifierVisitor::ReplaceInstructionIfSameShape(
   if (!SameShape(old_instruction, new_instruction)) {
     return false;
   }
-  TF_CHECK_OK(
-      computation_->ReplaceInstruction(old_instruction, new_instruction));
-  changed_ = true;
+  TF_CHECK_OK(ReplaceInstruction(old_instruction, new_instruction));
   return true;
 }
 
@@ -329,63 +331,6 @@ Status AlgebraicSimplifierVisitor::HandleCopy(HloInstruction* copy,
   return Status::OK();
 }
 
-StatusOr<bool> AlgebraicSimplifierVisitor::TryConcatenateConstantFold(
-    HloInstruction* concatenate,
-    tensorflow::gtl::ArraySlice<HloInstruction*> operands) {
-  if (operands[0]->opcode() == HloOpcode::kConstant) {
-    // If all the operands of a concatenate are constant, fold them into a
-    // single constant tensor.
-    // The concatenate dimension is going to be the sum of all the concatenate
-    // dimensions.
-    int64 concat_dim = concatenate->dimensions()[0];
-    const Shape& reference_shape = operands[0]->shape();
-    if (ShapeUtil::IsTuple(reference_shape)) {
-      VLOG(5) << "Tuples not currently supported by the concatenate constant"
-                 " folder";
-      return false;
-    }
-    int64 rank = ShapeUtil::Rank(reference_shape);
-    std::vector<int64> concat_dimensions(reference_shape.dimensions().begin(),
-                                         reference_shape.dimensions().end());
-    if (concat_dim < 0) {
-      concat_dim += rank;
-    }
-    for (int64 i = 1; i < operands.size(); ++i) {
-      const Shape& operand_shape = operands[i]->shape();
-      if (operands[i]->opcode() != HloOpcode::kConstant ||
-          ShapeUtil::IsTuple(operand_shape)) {
-        return false;
-      }
-      // Accumulate the concat dimension from all tensors taking part to the
-      // operation.
-      concat_dimensions[concat_dim] +=
-          ShapeUtil::GetDimension(operand_shape, concat_dim);
-    }
-
-    auto literal = LiteralUtil::CreateFromDimensions(
-        reference_shape.element_type(), concat_dimensions);
-    std::vector<int64> source_indices(rank, 0);
-    std::vector<int64> dest_indices(concat_dimensions.size(), 0);
-    for (auto operand : operands) {
-      const Shape& operand_shape = operand->shape();
-      Status status = LiteralUtil::Copy(
-          operand->literal(), source_indices, literal.get(), dest_indices,
-          AsInt64Slice(operand_shape.dimensions()));
-      if (!status.ok()) {
-        VLOG(1) << "Error while creating concatenated literal : " << status;
-        return false;
-      }
-      dest_indices[concat_dim] +=
-          ShapeUtil::GetDimension(operand_shape, concat_dim);
-    }
-    TF_CHECK_OK(computation_->ReplaceWithNewInstruction(
-        concatenate, HloInstruction::CreateConstant(std::move(literal))));
-    changed_ = true;
-    return true;
-  }
-  return false;
-}
-
 Status AlgebraicSimplifierVisitor::HandleConcatenate(
     HloInstruction* concatenate,
     tensorflow::gtl::ArraySlice<HloInstruction*> operands) {
@@ -394,13 +339,6 @@ Status AlgebraicSimplifierVisitor::HandleConcatenate(
     ReplaceInstructionIfSameShape(concatenate, operands[0]);
     return Status::OK();
   }
-  // If all the concatenate operands are constant, this will get folded into a
-  // new constant literal.
-  TF_ASSIGN_OR_RETURN(bool folded,
-                      TryConcatenateConstantFold(concatenate, operands));
-  if (folded) {
-    return Status::OK();
-  }
   // Filter out and remove empty operands.
   std::vector<HloInstruction*> nonempty_operands;
   for (HloInstruction* operand : operands) {
@@ -799,65 +737,6 @@ Status AlgebraicSimplifierVisitor::HandleBroadcast(HloInstruction* broadcast) {
   return Status::OK();
 }
 
-template <PrimitiveType primitive_src_type, PrimitiveType primitive_dest_type>
-static std::unique_ptr<HloInstruction> ConvertIfTypesMatch(
-    const Literal& src_literal) {
-  CHECK_EQ(primitive_src_type, src_literal.shape().element_type());
-
-  return HloInstruction::CreateConstant(
-      LiteralUtil::Convert<typename primitive_util::PrimitiveTypeToNative<
-                               primitive_src_type>::type,
-                           typename primitive_util::PrimitiveTypeToNative<
-                               primitive_dest_type>::type>(src_literal));
-}
-
-template <PrimitiveType primitive_src_type>
-static std::unique_ptr<HloInstruction> ConvertIfDestTypeMatches(
-    const Literal& src_literal, PrimitiveType primitive_dest_type) {
-  switch (primitive_dest_type) {
-#define CONVERT_IF_TYPES_MATCH(type) \
-  case (type):                       \
-    return ConvertIfTypesMatch<primitive_src_type, (type)>(src_literal);
-    CONVERT_IF_TYPES_MATCH(PRED)
-    CONVERT_IF_TYPES_MATCH(S8)
-    CONVERT_IF_TYPES_MATCH(S32)
-    CONVERT_IF_TYPES_MATCH(S64)
-    CONVERT_IF_TYPES_MATCH(U8)
-    CONVERT_IF_TYPES_MATCH(U32)
-    CONVERT_IF_TYPES_MATCH(U64)
-    CONVERT_IF_TYPES_MATCH(F32)
-    CONVERT_IF_TYPES_MATCH(F64)
-#undef CONVERT_IF_TYPES_MATCH
-    // Other types are not yet supported.
-    default:
-      LOG(FATAL) << "Unimplemented: ConvertIfDestTypeMatches for type "
-                 << PrimitiveType_Name(src_literal.shape().element_type());
-  }
-}
-
-static std::unique_ptr<HloInstruction> ConvertIfSrcTypeMatches(
-    const Literal& src_literal, PrimitiveType primitive_dest_type) {
-  switch (src_literal.shape().element_type()) {
-#define CONVERT_IF_DEST_TYPE_MATCHES(type) \
-  case (type):                             \
-    return ConvertIfDestTypeMatches<(type)>(src_literal, primitive_dest_type);
-    CONVERT_IF_DEST_TYPE_MATCHES(PRED)
-    CONVERT_IF_DEST_TYPE_MATCHES(S8)
-    CONVERT_IF_DEST_TYPE_MATCHES(S32)
-    CONVERT_IF_DEST_TYPE_MATCHES(S64)
-    CONVERT_IF_DEST_TYPE_MATCHES(U8)
-    CONVERT_IF_DEST_TYPE_MATCHES(U32)
-    CONVERT_IF_DEST_TYPE_MATCHES(U64)
-    CONVERT_IF_DEST_TYPE_MATCHES(F32)
-    CONVERT_IF_DEST_TYPE_MATCHES(F64)
-#undef CONVERT_IF_DEST_TYPE_MATCHES
-    // Other types are not yet supported.
-    default:
-      LOG(FATAL) << "Unimplemented: ConvertIfSrcTypeMatches for type "
-                 << PrimitiveType_Name(src_literal.shape().element_type());
-  }
-}
-
 // A conversion to the same element type as the operand is a nop and can be
 // removed.  A conversion of a constant can be simplified by making a new
 // constant.
@@ -866,14 +745,7 @@ Status AlgebraicSimplifierVisitor::HandleConvert(HloInstruction* convert,
   PrimitiveType src_type = operand->shape().element_type();
   PrimitiveType dest_type = convert->shape().element_type();
   if (src_type == dest_type) {
-    changed_ = true;
-    return computation_->ReplaceInstruction(convert, operand);
-  }
-  if (operand->opcode() == HloOpcode::kConstant) {
-    const Literal& src_literal = operand->literal();
-    std::unique_ptr<HloInstruction> new_constant =
-        ConvertIfSrcTypeMatches(src_literal, dest_type);
-    return ReplaceWithNewInstruction(convert, std::move(new_constant));
+    return ReplaceInstruction(convert, operand);
   }
   return Status::OK();
 }
@@ -1080,8 +952,7 @@ Status AlgebraicSimplifierVisitor::HandleReshape(HloInstruction* reshape) {
   // Delete no-op reshapes, i.e. where shape = operand shape.
   if (SameShape(reshape, operand)) {
     VLOG(10) << "deleting no-op reshape";
-    changed_ = true;
-    return computation_->ReplaceInstruction(reshape, operand);
+    return ReplaceInstruction(reshape, operand);
   }
 
   // Merge reshapes.
@@ -1131,8 +1002,7 @@ Status AlgebraicSimplifierVisitor::HandleReverse(HloInstruction* reverse,
   };
   if (std::all_of(reverse->dimensions().begin(), reverse->dimensions().end(),
                   dim_is_one)) {
-    changed_ = true;
-    return computation_->ReplaceInstruction(reverse, operand);
+    return ReplaceInstruction(reverse, operand);
   }
   return Status::OK();
 }
@@ -1143,21 +1013,6 @@ Status AlgebraicSimplifierVisitor::HandleSlice(HloInstruction* slice,
   if (ReplaceInstructionIfSameShape(slice, operand)) {
     return Status::OK();
   }
-  if (operand->opcode() == HloOpcode::kConstant) {
-    const Shape& shape = slice->shape();
-    auto literal = LiteralUtil::CreateFromDimensions(
-        shape.element_type(), AsInt64Slice(shape.dimensions()));
-    std::vector<int64> dest_indices(slice->slice_starts().size(), 0);
-    Status status = LiteralUtil::Copy(operand->literal(), slice->slice_starts(),
-                                      literal.get(), dest_indices,
-                                      AsInt64Slice(shape.dimensions()));
-    if (status.ok()) {
-      TF_CHECK_OK(ReplaceWithNewInstruction(
-          slice, HloInstruction::CreateConstant(std::move(literal))));
-    } else {
-      VLOG(1) << "Error while creating sliced literal : " << status;
-    }
-  }
   return Status::OK();
 }
 
@@ -1247,8 +1102,7 @@ Status AlgebraicSimplifierVisitor::HandleTranspose(HloInstruction* transpose) {
   if (std::is_sorted(transpose->dimensions().begin(),
                      transpose->dimensions().end())) {
     VLOG(10) << "deleting no-op transpose";
-    changed_ = true;
-    return computation_->ReplaceInstruction(transpose, operand);
+    return ReplaceInstruction(transpose, operand);
   }
 
   if (HloOpcode::kTranspose == operand->opcode()) {
@@ -1379,9 +1233,7 @@ Status AlgebraicSimplifierVisitor::HandleConvolution(
   auto new_rhs = add_bitcast(new_filter_shape, rhs);
   auto dot = computation_->AddInstruction(HloInstruction::CreateBinary(
       dot_output_shape, HloOpcode::kDot, new_lhs, new_rhs));
-  changed_ = true;
-  return computation_->ReplaceInstruction(convolution,
-                                          add_bitcast(convolution_shape, dot));
+  return ReplaceInstruction(convolution, add_bitcast(convolution_shape, dot));
 }
 
 bool AlgebraicSimplifierVisitor::TransformToClampIfSameShape(
diff --git a/tensorflow/compiler/xla/service/algebraic_simplifier_test.cc b/tensorflow/compiler/xla/service/algebraic_simplifier_test.cc
index 0cce076da5..f4b42055b7 100644
--- a/tensorflow/compiler/xla/service/algebraic_simplifier_test.cc
+++ b/tensorflow/compiler/xla/service/algebraic_simplifier_test.cc
@@ -466,75 +466,6 @@ TEST_F(AlgebraicSimplifierTest, ConvertBetweenSameType) {
   EXPECT_THAT(computation->root_instruction(), input);
 }
 
-TEST_F(AlgebraicSimplifierTest, ConvertF32ToS64) {
-  HloComputation::Builder builder(TestName());
-  HloInstruction* input = builder.AddInstruction(
-      HloInstruction::CreateConstant(LiteralUtil::CreateR0<float>(42.0f)));
-  builder.AddInstruction(
-      HloInstruction::CreateConvert(ShapeUtil::MakeShape(S64, {}), input));
-
-  auto module = MakeUnique<HloModule>(TestName());
-  auto computation = module->AddEntryComputation(builder.Build());
-
-  EXPECT_THAT(computation->root_instruction(), op::Convert(input));
-
-  AlgebraicSimplifier simplifier(/*is_layout_sensitive=*/false,
-                                 non_bitcasting_callback());
-  ASSERT_TRUE(simplifier.Run(module.get()).ValueOrDie());
-
-  EXPECT_THAT(computation->root_instruction(), op::Constant());
-  EXPECT_EQ(LiteralUtil::GetFirstElement<int64>(
-                computation->root_instruction()->literal()),
-            42);
-}
-
-TEST_F(AlgebraicSimplifierTest, ConvertS64ToF32) {
-  HloComputation::Builder builder(TestName());
-  HloInstruction* input = builder.AddInstruction(
-      HloInstruction::CreateConstant(LiteralUtil::CreateR0<int64>(42)));
-  builder.AddInstruction(
-      HloInstruction::CreateConvert(ShapeUtil::MakeShape(F32, {}), input));
-
-  auto module = MakeUnique<HloModule>(TestName());
-  auto computation = module->AddEntryComputation(builder.Build());
-
-  EXPECT_THAT(computation->root_instruction(), op::Convert(input));
-
-  AlgebraicSimplifier simplifier(/*is_layout_sensitive=*/false,
-                                 non_bitcasting_callback());
-  ASSERT_TRUE(simplifier.Run(module.get()).ValueOrDie());
-
-  EXPECT_THAT(computation->root_instruction(), op::Constant());
-  EXPECT_EQ(LiteralUtil::GetFirstElement<float>(
-                computation->root_instruction()->literal()),
-            42.0f);
-}
-
-TEST_F(AlgebraicSimplifierTest, ConvertF32ArrayToS64Array) {
-  HloComputation::Builder builder(TestName());
-  HloInstruction* input = builder.AddInstruction(HloInstruction::CreateConstant(
-      LiteralUtil::CreateR1<float>({42.0f, 19.0f})));
-  builder.AddInstruction(
-      HloInstruction::CreateConvert(ShapeUtil::MakeShape(S64, {2}), input));
-
-  auto module = MakeUnique<HloModule>(TestName());
-  auto computation = module->AddEntryComputation(builder.Build());
-
-  EXPECT_THAT(computation->root_instruction(), op::Convert(input));
-
-  AlgebraicSimplifier simplifier(/*is_layout_sensitive=*/false,
-                                 non_bitcasting_callback());
-  ASSERT_TRUE(simplifier.Run(module.get()).ValueOrDie());
-
-  EXPECT_THAT(computation->root_instruction(), op::Constant());
-  EXPECT_EQ(
-      LiteralUtil::Get<int64>(computation->root_instruction()->literal(), {0}),
-      42);
-  EXPECT_EQ(
-      LiteralUtil::Get<int64>(computation->root_instruction()->literal(), {1}),
-      19);
-}
-
 // Test that copies are removed.
 TEST_F(AlgebraicSimplifierTest, RemoveCopy) {
   Shape r0f32 = ShapeUtil::MakeShape(F32, {});
@@ -1666,69 +1597,5 @@ TEST_F(AlgebraicSimplifierTest, IteratorInvalidation) {
   ASSERT_TRUE(simplifier.Run(module.get()).ValueOrDie());
 }
 
-TEST_F(AlgebraicSimplifierTest, Concatenate) {
-  const struct TestConfig {
-    int concat_dimension;
-    tensorflow::gtl::ArraySlice<int64> dimensions;
-    tensorflow::gtl::ArraySlice<int64> concat_sizes;
-  } test_configs[] = {
-      {1, {11, 0, 7, 5, 9}, {2, 5, 7, 11}},
-      {3, {1, 4, 17, 0, 8}, {1, 3, 9, 12}},
-  };
-
-  for (auto& test_config : test_configs) {
-    HloComputation::Builder builder(TestName());
-    std::vector<int64> dimensions(test_config.dimensions.begin(),
-                                  test_config.dimensions.end());
-    int64 concat_size = 0;
-    std::vector<HloInstruction*> operands;
-    for (auto csize : test_config.concat_sizes) {
-      dimensions[test_config.concat_dimension] = csize;
-      concat_size += csize;
-      auto literal = LiteralUtil::CreateFromDimensions(F32, dimensions);
-      HloInstruction* insn = builder.AddInstruction(
-          HloInstruction::CreateConstant(std::move(literal)));
-      operands.push_back(insn);
-    }
-    dimensions[test_config.concat_dimension] = concat_size;
-    Shape shape = ShapeUtil::MakeShape(F32, dimensions);
-    builder.AddInstruction(HloInstruction::CreateConcatenate(
-        shape, operands, test_config.concat_dimension));
-    HloModule module(TestName());
-    auto computation = module.AddEntryComputation(builder.Build());
-
-    AlgebraicSimplifier simplifier(/*is_layout_sensitive=*/false,
-                                   non_bitcasting_callback());
-    ASSERT_TRUE(simplifier.Run(&module).ValueOrDie());
-
-    HloInstruction* root = computation->root_instruction();
-    EXPECT_EQ(root->opcode(), HloOpcode::kConstant);
-    EXPECT_TRUE(ShapeUtil::Equal(root->shape(), shape));
-  }
-}
-
-TEST_F(AlgebraicSimplifierTest, Slice) {
-  HloComputation::Builder builder(TestName());
-  const int64 dimensions[] = {11, 8, 7, 5, 9};
-  const int64 slice_start[] = {4, 2, 3, 1, 5};
-  const int64 slice_limits[] = {10, 8, 6, 5, 9};
-  auto literal = LiteralUtil::CreateFromDimensions(F32, dimensions);
-  HloInstruction* lit_insn = builder.AddInstruction(
-      HloInstruction::CreateConstant(std::move(literal)));
-  Shape shape = ShapeUtil::MakeShape(F32, {6, 6, 3, 4, 4});
-  builder.AddInstruction(
-      HloInstruction::CreateSlice(shape, lit_insn, slice_start, slice_limits));
-  HloModule module(TestName());
-  auto computation = module.AddEntryComputation(builder.Build());
-
-  AlgebraicSimplifier simplifier(/*is_layout_sensitive=*/false,
-                                 non_bitcasting_callback());
-  ASSERT_TRUE(simplifier.Run(&module).ValueOrDie());
-
-  HloInstruction* root = computation->root_instruction();
-  EXPECT_EQ(root->opcode(), HloOpcode::kConstant);
-  EXPECT_TRUE(ShapeUtil::Equal(root->shape(), shape));
-}
-
 }  // namespace
 }  // namespace xla
diff --git a/tensorflow/compiler/xla/service/gpu/BUILD b/tensorflow/compiler/xla/service/gpu/BUILD
index 1fdbcfe564..d26f415fd4 100644
--- a/tensorflow/compiler/xla/service/gpu/BUILD
+++ b/tensorflow/compiler/xla/service/gpu/BUILD
@@ -264,6 +264,8 @@ cc_library(
         "//tensorflow/compiler/xla/service:tuple_points_to_analysis",
         "//tensorflow/core:lib",
         "//tensorflow/core:stream_executor_no_cuda",
+        "//tensorflow/core/platform/default/build_config:cublas_plugin",
+        "//tensorflow/core/platform/default/build_config:cudnn_plugin",
         "//tensorflow/core/platform/default/build_config:stream_executor_cuda",
     ],
 )
diff --git a/tensorflow/compiler/xla/service/hlo_constant_folding.cc b/tensorflow/compiler/xla/service/hlo_constant_folding.cc
index 9a5345dc13..cb0a99d773 100644
--- a/tensorflow/compiler/xla/service/hlo_constant_folding.cc
+++ b/tensorflow/compiler/xla/service/hlo_constant_folding.cc
@@ -15,16 +15,14 @@ limitations under the License.
 
 #include "tensorflow/compiler/xla/service/hlo_constant_folding.h"
 
-#include <list>
-#include <map>
 #include <memory>
-#include <set>
 #include <string>
 #include <utility>
 #include <vector>
 
 #include "tensorflow/compiler/xla/layout_util.h"
 #include "tensorflow/compiler/xla/literal_util.h"
+#include "tensorflow/compiler/xla/service/dfs_hlo_visitor_with_default.h"
 #include "tensorflow/compiler/xla/service/hlo_computation.h"
 #include "tensorflow/compiler/xla/service/hlo_instruction.h"
 #include "tensorflow/compiler/xla/service/hlo_opcode.h"
@@ -34,52 +32,222 @@ limitations under the License.
 #include "tensorflow/core/lib/core/errors.h"
 
 namespace xla {
+namespace {
+
+template <PrimitiveType primitive_src_type, PrimitiveType primitive_dest_type>
+static std::unique_ptr<Literal> ConvertIfTypesMatch(
+    const Literal& src_literal) {
+  CHECK_EQ(primitive_src_type, src_literal.shape().element_type());
+  return LiteralUtil::Convert<
+      typename primitive_util::PrimitiveTypeToNative<primitive_src_type>::type,
+      typename primitive_util::PrimitiveTypeToNative<
+          primitive_dest_type>::type>(src_literal);
+}
+
+template <PrimitiveType primitive_src_type>
+static std::unique_ptr<Literal> ConvertIfDestTypeMatches(
+    const Literal& src_literal, PrimitiveType primitive_dest_type) {
+  switch (primitive_dest_type) {
+#define CONVERT_IF_TYPES_MATCH(type) \
+  case (type):                       \
+    return ConvertIfTypesMatch<primitive_src_type, (type)>(src_literal);
+    CONVERT_IF_TYPES_MATCH(PRED)
+    CONVERT_IF_TYPES_MATCH(S8)
+    CONVERT_IF_TYPES_MATCH(S32)
+    CONVERT_IF_TYPES_MATCH(S64)
+    CONVERT_IF_TYPES_MATCH(U8)
+    CONVERT_IF_TYPES_MATCH(U32)
+    CONVERT_IF_TYPES_MATCH(U64)
+    CONVERT_IF_TYPES_MATCH(F32)
+    CONVERT_IF_TYPES_MATCH(F64)
+#undef CONVERT_IF_TYPES_MATCH
+    // Other types are not yet supported.
+    default:
+      LOG(FATAL) << "Unimplemented: ConvertIfDestTypeMatches for type "
+                 << PrimitiveType_Name(src_literal.shape().element_type());
+  }
+}
+
+static std::unique_ptr<Literal> ConvertIfSrcTypeMatches(
+    const Literal& src_literal, PrimitiveType primitive_dest_type) {
+  switch (src_literal.shape().element_type()) {
+#define CONVERT_IF_DEST_TYPE_MATCHES(type) \
+  case (type):                             \
+    return ConvertIfDestTypeMatches<(type)>(src_literal, primitive_dest_type);
+    CONVERT_IF_DEST_TYPE_MATCHES(PRED)
+    CONVERT_IF_DEST_TYPE_MATCHES(S8)
+    CONVERT_IF_DEST_TYPE_MATCHES(S32)
+    CONVERT_IF_DEST_TYPE_MATCHES(S64)
+    CONVERT_IF_DEST_TYPE_MATCHES(U8)
+    CONVERT_IF_DEST_TYPE_MATCHES(U32)
+    CONVERT_IF_DEST_TYPE_MATCHES(U64)
+    CONVERT_IF_DEST_TYPE_MATCHES(F32)
+    CONVERT_IF_DEST_TYPE_MATCHES(F64)
+#undef CONVERT_IF_DEST_TYPE_MATCHES
+    // Other types are not yet supported.
+    default:
+      LOG(FATAL) << "Unimplemented: ConvertIfSrcTypeMatches for type "
+                 << PrimitiveType_Name(src_literal.shape().element_type());
+  }
+}
+
+}  // namespace
+
+// ConstantFolderVisitor traverses the HLO computation and reduces certain
+// constant graph sections, to literals.
+class ConstantFolderVisitor : public DfsHloVisitorWithDefault {
+ public:
+  // Default visitor action is to do nothing and return OK.
+  Status DefaultAction(HloInstruction* /*hlo_instruction*/) override {
+    return Status::OK();
+  }
+
+  Status HandleConcatenate(
+      HloInstruction* concatenate,
+      tensorflow::gtl::ArraySlice<HloInstruction*> operands) override;
+
+  Status HandleConvert(HloInstruction* convert,
+                       HloInstruction* operand) override;
+
+  Status HandleReshape(HloInstruction* reshape) override;
+
+  Status HandleSlice(HloInstruction* slice, HloInstruction* operand) override;
+
+  Status HandleTranspose(HloInstruction* transpose) override;
+
+  // Returns whether a constant folding operation has occurred.
+  const bool changed() const { return changed_; }
+
+  // Runs the visitor on a computation and returns whether any changes were
+  // performed.
+  static StatusOr<bool> Run(HloComputation* computation);
+
+ private:
+  ConstantFolderVisitor() = default;
+
+  // Replaces the existing HLO instruction old_instruction, with a literal,
+  // and marks the optimizer status as changed.
+  // Returns the Status representing the result of the replace operation.
+  Status ReplaceWithConstant(HloInstruction* old_instruction,
+                             std::unique_ptr<Literal> literal) {
+    TF_RETURN_IF_ERROR(old_instruction->parent()->ReplaceWithNewInstruction(
+        old_instruction, HloInstruction::CreateConstant(std::move(literal))));
+    changed_ = true;
+    return Status::OK();
+  }
+
+  // Whether any constant folding operations have occurred.
+  bool changed_ = false;
+};
+
+StatusOr<bool> ConstantFolderVisitor::Run(HloComputation* computation) {
+  ConstantFolderVisitor visitor;
+  TF_RETURN_IF_ERROR(computation->Accept(&visitor));
+  return visitor.changed();
+}
 
 StatusOr<bool> HloConstantFolding::Run(HloModule* module) {
+  XLA_VLOG_LINES(2,
+                 "HloConstantFolding::Run(), before:\n" + module->ToString());
   bool changed = false;
-  for (auto& computation : module->computations()) {
-    for (auto instruction : computation->MakeInstructionPostOrder()) {
-      // Skip dead code.
-      if (instruction->user_count() == 0 &&
-          computation->root_instruction() != instruction) {
-        continue;
-      }
-      // Depending on the opcode, choose how to handle constant operands.
-      //
-      // TODO(b/35975797): Fold constant computations for more than reshapes and
-      // transposes.
-      switch (instruction->opcode()) {
-        case HloOpcode::kReshape: {
-          if (instruction->operand(0)->opcode() == HloOpcode::kConstant) {
-            TF_ASSIGN_OR_RETURN(
-                auto reshaped_literal,
-                LiteralUtil::Reshape(
-                    instruction->operand(0)->literal(),
-                    AsInt64Slice(instruction->shape().dimensions())));
-            TF_CHECK_OK(computation->ReplaceWithNewInstruction(
-                instruction,
-                HloInstruction::CreateConstant(std::move(reshaped_literal))));
-            changed = true;
-          }
-          break;
-        }
-        case HloOpcode::kTranspose: {
-          if (instruction->operand(0)->opcode() == HloOpcode::kConstant) {
-            auto transposed_literal = LiteralUtil::Transpose(
-                instruction->operand(0)->literal(), instruction->dimensions());
-            TF_CHECK_OK(computation->ReplaceWithNewInstruction(
-                instruction,
-                HloInstruction::CreateConstant(std::move(transposed_literal))));
-            changed = true;
-          }
-          break;
-        }
-        default:
-          break;
+  for (auto& comp : module->computations()) {
+    TF_ASSIGN_OR_RETURN(bool result, ConstantFolderVisitor::Run(comp.get()));
+    changed = changed || result;
+  }
+  XLA_VLOG_LINES(2, "HloConstantFolding::Run(), after:\n" + module->ToString());
+  return changed;
+}
+
+Status ConstantFolderVisitor::HandleReshape(HloInstruction* reshape) {
+  if (reshape->operand(0)->opcode() == HloOpcode::kConstant) {
+    TF_ASSIGN_OR_RETURN(
+        auto reshaped_literal,
+        LiteralUtil::Reshape(reshape->operand(0)->literal(),
+                             AsInt64Slice(reshape->shape().dimensions())));
+    return ReplaceWithConstant(reshape, std::move(reshaped_literal));
+  }
+  return Status::OK();
+}
+
+Status ConstantFolderVisitor::HandleTranspose(HloInstruction* transpose) {
+  if (transpose->operand(0)->opcode() == HloOpcode::kConstant) {
+    auto transposed_literal = LiteralUtil::Transpose(
+        transpose->operand(0)->literal(), transpose->dimensions());
+    return ReplaceWithConstant(transpose, std::move(transposed_literal));
+  }
+  return Status::OK();
+}
+
+Status ConstantFolderVisitor::HandleConcatenate(
+    HloInstruction* concatenate,
+    tensorflow::gtl::ArraySlice<HloInstruction*> operands) {
+  if (operands[0]->opcode() == HloOpcode::kConstant) {
+    // If all the operands of a concatenate are constant, fold them into a
+    // single constant tensor.
+    // The result concatenate dimension is going to be the sum of all the
+    // concatenate dimensions of the arrays taking part of the operation.
+    int64 concat_dim = concatenate->dimensions()[0];
+    const Shape& reference_shape = operands[0]->shape();
+    CHECK(!ShapeUtil::IsTuple(reference_shape));
+    int64 rank = ShapeUtil::Rank(reference_shape);
+    std::vector<int64> concat_dimensions(reference_shape.dimensions().begin(),
+                                         reference_shape.dimensions().end());
+    if (concat_dim < 0) {
+      concat_dim += rank;
+    }
+    for (int64 i = 1; i < operands.size(); ++i) {
+      const Shape& operand_shape = operands[i]->shape();
+      CHECK(!ShapeUtil::IsTuple(operand_shape));
+      if (operands[i]->opcode() != HloOpcode::kConstant) {
+        return Status::OK();
       }
+      // Accumulate the concat dimension from all tensors taking part to the
+      // operation.
+      concat_dimensions[concat_dim] +=
+          ShapeUtil::GetDimension(operand_shape, concat_dim);
+    }
+
+    auto literal = LiteralUtil::CreateFromDimensions(
+        reference_shape.element_type(), concat_dimensions);
+    std::vector<int64> source_indices(rank, 0);
+    std::vector<int64> dest_indices(concat_dimensions.size(), 0);
+    for (auto operand : operands) {
+      const Shape& operand_shape = operand->shape();
+      TF_RETURN_IF_ERROR(LiteralUtil::Copy(
+          operand->literal(), source_indices, literal.get(), dest_indices,
+          AsInt64Slice(operand_shape.dimensions())));
+      dest_indices[concat_dim] +=
+          ShapeUtil::GetDimension(operand_shape, concat_dim);
     }
+    return ReplaceWithConstant(concatenate, std::move(literal));
   }
-  return changed;
+  return Status::OK();
+}
+
+Status ConstantFolderVisitor::HandleSlice(HloInstruction* slice,
+                                          HloInstruction* operand) {
+  if (operand->opcode() == HloOpcode::kConstant) {
+    const Shape& shape = slice->shape();
+    auto literal = LiteralUtil::CreateFromDimensions(
+        shape.element_type(), AsInt64Slice(shape.dimensions()));
+    std::vector<int64> dest_indices(slice->slice_starts().size(), 0);
+    TF_RETURN_IF_ERROR(LiteralUtil::Copy(
+        operand->literal(), slice->slice_starts(), literal.get(), dest_indices,
+        AsInt64Slice(shape.dimensions())));
+    TF_RETURN_IF_ERROR(ReplaceWithConstant(slice, std::move(literal)));
+  }
+  return Status::OK();
+}
+
+Status ConstantFolderVisitor::HandleConvert(HloInstruction* convert,
+                                            HloInstruction* operand) {
+  if (operand->opcode() == HloOpcode::kConstant) {
+    const Literal& src_literal = operand->literal();
+    std::unique_ptr<Literal> new_constant =
+        ConvertIfSrcTypeMatches(src_literal, convert->shape().element_type());
+    return ReplaceWithConstant(convert, std::move(new_constant));
+  }
+  return Status::OK();
 }
 
 }  // namespace xla
diff --git a/tensorflow/compiler/xla/service/hlo_constant_folding.h b/tensorflow/compiler/xla/service/hlo_constant_folding.h
index 514bb8164c..f45eccf825 100644
--- a/tensorflow/compiler/xla/service/hlo_constant_folding.h
+++ b/tensorflow/compiler/xla/service/hlo_constant_folding.h
@@ -25,12 +25,10 @@ namespace xla {
 // computation on constants.
 class HloConstantFolding : public HloPassInterface {
  public:
-  explicit HloConstantFolding() {}
-  ~HloConstantFolding() override {}
   tensorflow::StringPiece name() const override { return "constant_folding"; }
 
-  // Run ConstantFolding on the given module. Returns whether the module was
-  // changed (common subexpressions were found and eliminated).
+  // Run constant folding operations on the given module. Returns whether the
+  // module was changed (constant expressions folded).
   StatusOr<bool> Run(HloModule* module) override;
 };
 
diff --git a/tensorflow/compiler/xla/service/hlo_constant_folding_test.cc b/tensorflow/compiler/xla/service/hlo_constant_folding_test.cc
new file mode 100644
index 0000000000..d20f423bd6
--- /dev/null
+++ b/tensorflow/compiler/xla/service/hlo_constant_folding_test.cc
@@ -0,0 +1,169 @@
+/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+#include "tensorflow/compiler/xla/service/hlo_constant_folding.h"
+
+#include <memory>
+#include <utility>
+
+#include "tensorflow/compiler/xla/layout_util.h"
+#include "tensorflow/compiler/xla/literal_util.h"
+#include "tensorflow/compiler/xla/service/hlo_computation.h"
+#include "tensorflow/compiler/xla/service/hlo_instruction.h"
+#include "tensorflow/compiler/xla/service/hlo_matchers.h"
+#include "tensorflow/compiler/xla/service/hlo_opcode.h"
+#include "tensorflow/compiler/xla/service/hlo_pass_fix.h"
+#include "tensorflow/compiler/xla/shape_util.h"
+#include "tensorflow/compiler/xla/test.h"
+#include "tensorflow/compiler/xla/tests/hlo_test_base.h"
+#include "tensorflow/compiler/xla/types.h"
+
+namespace op = xla::testing::opcode_matchers;
+
+namespace xla {
+namespace {
+
+using HloConstantFoldingTest = HloTestBase;
+
+TEST_F(HloConstantFoldingTest, ConvertF32ToS64) {
+  HloComputation::Builder builder(TestName());
+  HloInstruction* input = builder.AddInstruction(
+      HloInstruction::CreateConstant(LiteralUtil::CreateR0<float>(42.0f)));
+  builder.AddInstruction(
+      HloInstruction::CreateConvert(ShapeUtil::MakeShape(S64, {}), input));
+
+  auto module = MakeUnique<HloModule>(TestName());
+  auto computation = module->AddEntryComputation(builder.Build());
+
+  EXPECT_THAT(computation->root_instruction(), op::Convert(input));
+
+  HloConstantFolding simplifier;
+  ASSERT_TRUE(simplifier.Run(module.get()).ValueOrDie());
+
+  EXPECT_THAT(computation->root_instruction(), op::Constant());
+  EXPECT_EQ(LiteralUtil::GetFirstElement<int64>(
+                computation->root_instruction()->literal()),
+            42);
+}
+
+TEST_F(HloConstantFoldingTest, ConvertS64ToF32) {
+  HloComputation::Builder builder(TestName());
+  HloInstruction* input = builder.AddInstruction(
+      HloInstruction::CreateConstant(LiteralUtil::CreateR0<int64>(42)));
+  builder.AddInstruction(
+      HloInstruction::CreateConvert(ShapeUtil::MakeShape(F32, {}), input));
+
+  auto module = MakeUnique<HloModule>(TestName());
+  auto computation = module->AddEntryComputation(builder.Build());
+
+  EXPECT_THAT(computation->root_instruction(), op::Convert(input));
+
+  HloConstantFolding simplifier;
+  ASSERT_TRUE(simplifier.Run(module.get()).ValueOrDie());
+
+  EXPECT_THAT(computation->root_instruction(), op::Constant());
+  EXPECT_EQ(LiteralUtil::GetFirstElement<float>(
+                computation->root_instruction()->literal()),
+            42.0f);
+}
+
+TEST_F(HloConstantFoldingTest, ConvertF32ArrayToS64Array) {
+  HloComputation::Builder builder(TestName());
+  HloInstruction* input = builder.AddInstruction(HloInstruction::CreateConstant(
+      LiteralUtil::CreateR1<float>({42.0f, 19.0f})));
+  builder.AddInstruction(
+      HloInstruction::CreateConvert(ShapeUtil::MakeShape(S64, {2}), input));
+
+  auto module = MakeUnique<HloModule>(TestName());
+  auto computation = module->AddEntryComputation(builder.Build());
+
+  EXPECT_THAT(computation->root_instruction(), op::Convert(input));
+
+  HloConstantFolding simplifier;
+  ASSERT_TRUE(simplifier.Run(module.get()).ValueOrDie());
+
+  EXPECT_THAT(computation->root_instruction(), op::Constant());
+  EXPECT_EQ(
+      LiteralUtil::Get<int64>(computation->root_instruction()->literal(), {0}),
+      42);
+  EXPECT_EQ(
+      LiteralUtil::Get<int64>(computation->root_instruction()->literal(), {1}),
+      19);
+}
+
+TEST_F(HloConstantFoldingTest, Concatenate) {
+  const struct TestConfig {
+    int concat_dimension;
+    tensorflow::gtl::ArraySlice<int64> dimensions;
+    tensorflow::gtl::ArraySlice<int64> concat_sizes;
+  } test_configs[] = {
+      {1, {11, 0, 7, 5, 9}, {2, 5, 7, 11}},
+      {3, {1, 4, 17, 0, 8}, {1, 3, 9, 12}},
+  };
+
+  for (auto& test_config : test_configs) {
+    HloComputation::Builder builder(TestName());
+    std::vector<int64> dimensions(test_config.dimensions.begin(),
+                                  test_config.dimensions.end());
+    int64 concat_size = 0;
+    std::vector<HloInstruction*> operands;
+    for (auto csize : test_config.concat_sizes) {
+      dimensions[test_config.concat_dimension] = csize;
+      concat_size += csize;
+      auto literal = LiteralUtil::CreateFromDimensions(F32, dimensions);
+      HloInstruction* insn = builder.AddInstruction(
+          HloInstruction::CreateConstant(std::move(literal)));
+      operands.push_back(insn);
+    }
+    dimensions[test_config.concat_dimension] = concat_size;
+    Shape shape = ShapeUtil::MakeShape(F32, dimensions);
+    builder.AddInstruction(HloInstruction::CreateConcatenate(
+        shape, operands, test_config.concat_dimension));
+    HloModule module(TestName());
+    auto computation = module.AddEntryComputation(builder.Build());
+
+    HloConstantFolding simplifier;
+    ASSERT_TRUE(simplifier.Run(&module).ValueOrDie());
+
+    HloInstruction* root = computation->root_instruction();
+    EXPECT_THAT(root, op::Constant());
+    EXPECT_TRUE(ShapeUtil::Equal(root->shape(), shape));
+  }
+}
+
+TEST_F(HloConstantFoldingTest, Slice) {
+  HloComputation::Builder builder(TestName());
+  const int64 dimensions[] = {11, 8, 7, 5, 9};
+  const int64 slice_start[] = {4, 2, 3, 1, 5};
+  const int64 slice_limits[] = {10, 8, 6, 5, 9};
+  auto literal = LiteralUtil::CreateFromDimensions(F32, dimensions);
+  HloInstruction* lit_insn = builder.AddInstruction(
+      HloInstruction::CreateConstant(std::move(literal)));
+  Shape shape = ShapeUtil::MakeShape(F32, {6, 6, 3, 4, 4});
+  builder.AddInstruction(
+      HloInstruction::CreateSlice(shape, lit_insn, slice_start, slice_limits));
+  HloModule module(TestName());
+  auto computation = module.AddEntryComputation(builder.Build());
+
+  HloConstantFolding simplifier;
+  ASSERT_TRUE(simplifier.Run(&module).ValueOrDie());
+
+  HloInstruction* root = computation->root_instruction();
+  EXPECT_THAT(root, op::Constant());
+  EXPECT_TRUE(ShapeUtil::Equal(root->shape(), shape));
+}
+
+}  // namespace
+}  // namespace xla
diff --git a/tensorflow/compiler/xla/service/hlo_evaluator.cc b/tensorflow/compiler/xla/service/hlo_evaluator.cc
index ebe7428052..1b3babc214 100644
--- a/tensorflow/compiler/xla/service/hlo_evaluator.cc
+++ b/tensorflow/compiler/xla/service/hlo_evaluator.cc
@@ -26,20 +26,21 @@ limitations under the License.
 #include "tensorflow/compiler/xla/index_util.h"
 #include "tensorflow/compiler/xla/layout_util.h"
 #include "tensorflow/compiler/xla/literal_util.h"
+#include "tensorflow/compiler/xla/map_util.h"
 #include "tensorflow/compiler/xla/primitive_util.h"
 #include "tensorflow/compiler/xla/ptr_util.h"
 #include "tensorflow/compiler/xla/service/hlo_opcode.h"
 #include "tensorflow/compiler/xla/service/hlo_query.h"
 #include "tensorflow/compiler/xla/shape_util.h"
+#include "tensorflow/compiler/xla/status.h"
+#include "tensorflow/compiler/xla/status_macros.h"
 #include "tensorflow/compiler/xla/types.h"
 #include "tensorflow/compiler/xla/util.h"
 #include "tensorflow/core/lib/core/bitmap.h"
+#include "tensorflow/core/lib/core/errors.h"
 #include "tensorflow/core/lib/core/status.h"
 #include "tensorflow/core/lib/core/stringpiece.h"
-#include "tensorflow/core/lib/gtl/array_slice.h"
-#include "tensorflow/core/lib/gtl/inlined_vector.h"
 #include "tensorflow/core/platform/logging.h"
-#include "tensorflow/core/platform/macros.h"
 #include "tensorflow/core/platform/protobuf.h"
 #include "tensorflow/core/platform/types.h"
 
@@ -53,9 +54,7 @@ std::unique_ptr<Literal> ElementWiseUnaryOp(
     const Literal& operand) {
   DCHECK(ShapeUtil::SameDimensions(shape, operand.shape()));
 
-  auto result = MakeUnique<Literal>();
-  *result->mutable_shape() = shape;
-  LiteralUtil::Reserve(ShapeUtil::ElementsIn(shape), result.get());
+  auto result = LiteralUtil::CreateFromShape(shape);
 
   std::vector<int64> multi_index(ShapeUtil::Rank(result->shape()), 0);
   do {
@@ -74,9 +73,7 @@ std::unique_ptr<Literal> ElementWiseBinaryOp(
   DCHECK(ShapeUtil::SameDimensions(shape, rhs.shape()));
   DCHECK(ShapeUtil::SameDimensions(lhs.shape(), rhs.shape()));
 
-  auto result = MakeUnique<Literal>();
-  *result->mutable_shape() = shape;
-  LiteralUtil::Reserve(ShapeUtil::ElementsIn(shape), result.get());
+  auto result = LiteralUtil::CreateFromShape(shape);
 
   std::vector<int64> multi_index(ShapeUtil::Rank(result->shape()), 0);
   do {
@@ -99,9 +96,7 @@ std::unique_ptr<Literal> ElementWiseTernaryOp(
   DCHECK(ShapeUtil::SameDimensions(lhs.shape(), rhs.shape()));
   DCHECK(ShapeUtil::SameDimensions(rhs.shape(), ehs.shape()));
 
-  auto result = MakeUnique<Literal>();
-  *result->mutable_shape() = shape;
-  LiteralUtil::Reserve(ShapeUtil::ElementsIn(shape), result.get());
+  auto result = LiteralUtil::CreateFromShape(shape);
 
   std::vector<int64> multi_index(ShapeUtil::Rank(result->shape()), 0);
   do {
@@ -130,29 +125,130 @@ NativeT AbsoluteVal(NativeT value) {
   return std::abs(value);
 }
 
-template <typename NativeT>
-StatusOr<std::unique_ptr<Literal>> EvaluateOpForLiteralInternal(
+}  // namespace
+
+Status HloEvaluator::DefaultAction(HloInstruction* hlo) {
+  VLOG(2) << "Handle instruction: " << hlo->ToString();
+  Shape shape = hlo->shape();
+  TF_CHECK_OK(ShapeUtil::ValidateShape(shape));
+
+  TF_ASSIGN_OR_RETURN(evaluated_[hlo], EvaluateBasedOnType(hlo));
+  return Status::OK();
+}
+
+Status HloEvaluator::HandleParameter(HloInstruction* parameter) {
+  VLOG(2) << "HandleParameter: " << parameter->ToString();
+  const Literal* input_literal = arg_literals_[parameter->parameter_number()];
+  VLOG(2) << "Parameter evaluated to: "
+          << LiteralUtil::ToString(*input_literal);
+  CHECK(ShapeUtil::Equal(parameter->shape(), input_literal->shape()));
+
+  evaluated_[parameter] = MakeUnique<Literal>(*input_literal);
+  return Status::OK();
+}
+
+Status HloEvaluator::HandleConstant(HloInstruction* constant,
+                                    const Literal& literal) {
+  VLOG(2) << "HandleConstant: " << constant->ToString();
+  CHECK(ShapeUtil::Equal(constant->shape(), literal.shape()));
+
+  evaluated_[constant] = MakeUnique<Literal>(literal);
+  return Status::OK();
+}
+
+StatusOr<std::unique_ptr<Literal>> HloEvaluator::Evaluate(
+    HloComputation* computation,
+    tensorflow::gtl::ArraySlice<const Literal*> args) {
+  arg_literals_ = args;
+  TF_RETURN_IF_ERROR(computation->Accept(this));
+  return std::move(FindOrDie(evaluated_, computation->root_instruction()));
+}
+
+StatusOr<std::unique_ptr<Literal>> HloEvaluator::Evaluate(
+    HloInstruction* instruction,
+    tensorflow::gtl::ArraySlice<const Literal*> args) {
+  DCHECK(hlo_query::AllOperandsAreParametersOrConstants(*instruction));
+  Shape shape = instruction->shape();
+  TF_CHECK_OK(ShapeUtil::ValidateShape(shape));
+
+  arg_literals_ = args;
+
+  // Evaluate operands of Parameter type against the input literals which caches
+  // the evaluated literal results.
+  for (const auto operand : instruction->operands()) {
+    if (operand->opcode() == HloOpcode::kParameter) {
+      TF_CHECK_OK(HandleParameter(operand));
+    } else if (operand->opcode() == HloOpcode::kConstant) {
+      evaluated_[operand] = MakeUnique<Literal>(operand->literal());
+    }
+  }
+
+  TF_RETURN_IF_ERROR(instruction->Visit(this));
+  return std::move(FindOrDie(evaluated_, instruction));
+}
+
+StatusOr<std::unique_ptr<Literal>> HloEvaluator::EvaluateBasedOnType(
     HloInstruction* instruction) {
-  DCHECK(hlo_query::AllOperandsAreConstants(*instruction));
+  Shape shape = instruction->shape();
+  TF_CHECK_OK(ShapeUtil::ValidateShape(shape));
+
+  switch (shape.element_type()) {
+    case PRED:
+      return EvaluateSameTypedElementwise<bool>(instruction);
+    case U8:
+      return EvaluateSameTypedElementwise<uint8>(instruction);
+    case U16:
+      return Unimplemented("unhandled primitive type: %s.",
+                           PrimitiveType_Name(U16).c_str());
+    case U32:
+      return EvaluateSameTypedElementwise<uint32>(instruction);
+    case U64:
+      return EvaluateSameTypedElementwise<uint64>(instruction);
+    case S8:
+      return EvaluateSameTypedElementwise<int8>(instruction);
+    case S16:
+      return Unimplemented("unhandled primitive type: %s.",
+                           PrimitiveType_Name(S16).c_str());
+    case S32:
+      return EvaluateSameTypedElementwise<int32>(instruction);
+    case S64:
+      return EvaluateSameTypedElementwise<int64>(instruction);
+    case F16:
+      return Unimplemented("unhandled primitive type: %s.",
+                           PrimitiveType_Name(F16).c_str());
+    case F32:
+      return EvaluateSameTypedElementwise<float>(instruction);
+    case F64:
+      return EvaluateSameTypedElementwise<double>(instruction);
+    default:
+      return Unimplemented("unhandled primitive type: %s.",
+                           PrimitiveType_Name(shape.element_type()).c_str());
+  }
+}
 
+template <typename NativeT>
+StatusOr<std::unique_ptr<Literal>> HloEvaluator::EvaluateSameTypedElementwise(
+    HloInstruction* instruction) {
   const std::vector<HloInstruction*>& operands = instruction->operands();
   HloOpcode opcode = instruction->opcode();
   const Shape& shape = instruction->shape();
 
   switch (opcode) {
     // TODO(b/35950897): many of the stl function used here are not overloaded
-    // for all XLA primitive types.
+    // for every XLA primitive types.
+
     // Unary element-wise ops.
+    //
     case HloOpcode::kAbs:
       CHECK_EQ(operands.size(), 1);
       return ElementWiseUnaryOp<NativeT>(
           shape, [](NativeT operand) { return AbsoluteVal(operand); },
-          operands[0]->literal());
+          GetEvaluatedLiteralFor(operands[0]));
     case HloOpcode::kCeil:
       CHECK_EQ(operands.size(), 1);
       return ElementWiseUnaryOp<NativeT>(
           shape, [](NativeT operand) { return std::ceil(operand); },
-          operands[0]->literal());
+          GetEvaluatedLiteralFor(operands[0]));
     case HloOpcode::kConvert:
       CHECK_EQ(operands.size(), 1);
       // TODO(b/35950897): implement Convert.
@@ -162,37 +258,37 @@ StatusOr<std::unique_ptr<Literal>> EvaluateOpForLiteralInternal(
       CHECK_EQ(operands.size(), 1);
       return ElementWiseUnaryOp<NativeT>(
           shape, [](NativeT operand) { return operand; },
-          operands[0]->literal());
+          GetEvaluatedLiteralFor(operands[0]));
     case HloOpcode::kExp:
       CHECK_EQ(operands.size(), 1);
       return ElementWiseUnaryOp<NativeT>(
           shape, [](NativeT operand) { return std::exp(operand); },
-          operands[0]->literal());
+          GetEvaluatedLiteralFor(operands[0]));
     case HloOpcode::kFloor:
       CHECK_EQ(operands.size(), 1);
       return ElementWiseUnaryOp<NativeT>(
           shape, [](NativeT operand) { return std::floor(operand); },
-          operands[0]->literal());
+          GetEvaluatedLiteralFor(operands[0]));
     case HloOpcode::kIsFinite:
       CHECK_EQ(operands.size(), 1);
       return ElementWiseUnaryOp<NativeT>(
           shape, [](NativeT operand) { return std::isfinite(operand); },
-          operands[0]->literal());
+          GetEvaluatedLiteralFor(operands[0]));
     case HloOpcode::kLog:
       CHECK_EQ(operands.size(), 1);
       return ElementWiseUnaryOp<NativeT>(
           shape, [](NativeT operand) { return std::log(operand); },
-          operands[0]->literal());
+          GetEvaluatedLiteralFor(operands[0]));
     case HloOpcode::kLogicalNot:
       CHECK_EQ(operands.size(), 1);
       return ElementWiseUnaryOp<NativeT>(
           shape, [](NativeT operand) { return !operand; },
-          operands[0]->literal());
+          GetEvaluatedLiteralFor(operands[0]));
     case HloOpcode::kNegate:
       CHECK_EQ(operands.size(), 1);
       return ElementWiseUnaryOp<NativeT>(
           shape, [](NativeT operand) { return -operand; },
-          operands[0]->literal());
+          GetEvaluatedLiteralFor(operands[0]));
     case HloOpcode::kSign:
       CHECK_EQ(operands.size(), 1);
       CHECK(primitive_util::IsIntegralType(shape.element_type()));
@@ -201,95 +297,113 @@ StatusOr<std::unique_ptr<Literal>> EvaluateOpForLiteralInternal(
                                        return (NativeT(0) < operand) -
                                               (operand < NativeT(0));
                                      },
-                                     operands[0]->literal());
+                                     GetEvaluatedLiteralFor(operands[0]));
     case HloOpcode::kTanh:
       CHECK_EQ(operands.size(), 1);
       return ElementWiseUnaryOp<NativeT>(
           shape, [](NativeT operand) { return std::tanh(operand); },
-          operands[0]->literal());
+          GetEvaluatedLiteralFor(operands[0]));
     // Binary element-wise ops.
+    //
     case HloOpcode::kAdd:
       CHECK_EQ(operands.size(), 2);
       return ElementWiseBinaryOp<NativeT>(
           shape, [](NativeT lhs, NativeT rhs) { return lhs + rhs; },
-          operands[0]->literal(), operands[1]->literal());
+          GetEvaluatedLiteralFor(operands[0]),
+          GetEvaluatedLiteralFor(operands[1]));
     case HloOpcode::kDivide:
       CHECK_EQ(operands.size(), 2);
       return ElementWiseBinaryOp<NativeT>(
           shape, [](NativeT lhs, NativeT rhs) { return lhs / rhs; },
-          operands[0]->literal(), operands[1]->literal());
+          GetEvaluatedLiteralFor(operands[0]),
+          GetEvaluatedLiteralFor(operands[1]));
     case HloOpcode::kMultiply:
       CHECK_EQ(operands.size(), 2);
       return ElementWiseBinaryOp<NativeT>(
           shape, [](NativeT lhs, NativeT rhs) { return lhs * rhs; },
-          operands[0]->literal(), operands[1]->literal());
+          GetEvaluatedLiteralFor(operands[0]),
+          GetEvaluatedLiteralFor(operands[1]));
     case HloOpcode::kSubtract:
       CHECK_EQ(operands.size(), 2);
       return ElementWiseBinaryOp<NativeT>(
           shape, [](NativeT lhs, NativeT rhs) { return lhs - rhs; },
-          operands[0]->literal(), operands[1]->literal());
+          GetEvaluatedLiteralFor(operands[0]),
+          GetEvaluatedLiteralFor(operands[1]));
     case HloOpcode::kEq:
       CHECK_EQ(operands.size(), 2);
       return ElementWiseBinaryOp<bool>(
           shape, [](NativeT lhs, NativeT rhs) { return lhs == rhs; },
-          operands[0]->literal(), operands[1]->literal());
+          GetEvaluatedLiteralFor(operands[0]),
+          GetEvaluatedLiteralFor(operands[1]));
     case HloOpcode::kGe:
       CHECK_EQ(operands.size(), 2);
       return ElementWiseBinaryOp<bool>(
           shape, [](NativeT lhs, NativeT rhs) { return lhs >= rhs; },
-          operands[0]->literal(), operands[1]->literal());
+          GetEvaluatedLiteralFor(operands[0]),
+          GetEvaluatedLiteralFor(operands[1]));
     case HloOpcode::kGt:
       CHECK_EQ(operands.size(), 2);
       return ElementWiseBinaryOp<bool>(
           shape, [](NativeT lhs, NativeT rhs) { return lhs > rhs; },
-          operands[0]->literal(), operands[1]->literal());
+          GetEvaluatedLiteralFor(operands[0]),
+          GetEvaluatedLiteralFor(operands[1]));
     case HloOpcode::kLe:
       CHECK_EQ(operands.size(), 2);
       return ElementWiseBinaryOp<bool>(
           shape, [](NativeT lhs, NativeT rhs) { return lhs <= rhs; },
-          operands[0]->literal(), operands[1]->literal());
+          GetEvaluatedLiteralFor(operands[0]),
+          GetEvaluatedLiteralFor(operands[1]));
     case HloOpcode::kLt:
       CHECK_EQ(operands.size(), 2);
       return ElementWiseBinaryOp<bool>(
           shape, [](NativeT lhs, NativeT rhs) { return lhs < rhs; },
-          operands[0]->literal(), operands[1]->literal());
+          GetEvaluatedLiteralFor(operands[0]),
+          GetEvaluatedLiteralFor(operands[1]));
     case HloOpcode::kNe:
       CHECK_EQ(operands.size(), 2);
       return ElementWiseBinaryOp<bool>(
           shape, [](NativeT lhs, NativeT rhs) { return lhs != rhs; },
-          operands[0]->literal(), operands[1]->literal());
+          GetEvaluatedLiteralFor(operands[0]),
+          GetEvaluatedLiteralFor(operands[1]));
     case HloOpcode::kMaximum:
       CHECK_EQ(operands.size(), 2);
       return ElementWiseBinaryOp<NativeT>(
           shape, [](NativeT lhs, NativeT rhs) { return std::max(lhs, rhs); },
-          operands[0]->literal(), operands[1]->literal());
+          GetEvaluatedLiteralFor(operands[0]),
+          GetEvaluatedLiteralFor(operands[1]));
     case HloOpcode::kMinimum:
       CHECK_EQ(operands.size(), 2);
       return ElementWiseBinaryOp<NativeT>(
           shape, [](NativeT lhs, NativeT rhs) { return std::min(lhs, rhs); },
-          operands[0]->literal(), operands[1]->literal());
+          GetEvaluatedLiteralFor(operands[0]),
+          GetEvaluatedLiteralFor(operands[1]));
     case HloOpcode::kPower:
       CHECK_EQ(operands.size(), 2);
       return ElementWiseBinaryOp<NativeT>(
           shape, [](NativeT lhs, NativeT rhs) { return std::pow(lhs, rhs); },
-          operands[0]->literal(), operands[1]->literal());
+          GetEvaluatedLiteralFor(operands[0]),
+          GetEvaluatedLiteralFor(operands[1]));
     case HloOpcode::kRemainder:
       CHECK_EQ(operands.size(), 2);
       return ElementWiseBinaryOp<NativeT>(
           shape,
           [](NativeT lhs, NativeT rhs) { return std::remainder(lhs, rhs); },
-          operands[0]->literal(), operands[1]->literal());
+          GetEvaluatedLiteralFor(operands[0]),
+          GetEvaluatedLiteralFor(operands[1]));
     case HloOpcode::kLogicalAnd:
       CHECK_EQ(operands.size(), 2);
       return ElementWiseBinaryOp<NativeT>(
           shape, [](NativeT lhs, NativeT rhs) { return lhs && rhs; },
-          operands[0]->literal(), operands[1]->literal());
+          GetEvaluatedLiteralFor(operands[0]),
+          GetEvaluatedLiteralFor(operands[1]));
     case HloOpcode::kLogicalOr:
       CHECK_EQ(operands.size(), 2);
       return ElementWiseBinaryOp<NativeT>(
           shape, [](NativeT lhs, NativeT rhs) { return lhs || rhs; },
-          operands[0]->literal(), operands[1]->literal());
+          GetEvaluatedLiteralFor(operands[0]),
+          GetEvaluatedLiteralFor(operands[1]));
     // Ternary element-wise ops.
+    //
     case HloOpcode::kClamp: {
       CHECK_EQ(operands.size(), 3);
       std::function<NativeT(NativeT, NativeT, NativeT)> clamp_op =
@@ -297,8 +411,9 @@ StatusOr<std::unique_ptr<Literal>> EvaluateOpForLiteralInternal(
             return std::max(low, std::min(value, high));
           };
       return ElementWiseTernaryOp<NativeT, NativeT, NativeT, NativeT>(
-          shape, std::move(clamp_op), operands[0]->literal(),
-          operands[1]->literal(), operands[2]->literal());
+          shape, std::move(clamp_op), GetEvaluatedLiteralFor(operands[0]),
+          GetEvaluatedLiteralFor(operands[1]),
+          GetEvaluatedLiteralFor(operands[2]));
     } break;
     case HloOpcode::kSelect: {
       CHECK_EQ(operands.size(), 3);
@@ -311,8 +426,9 @@ StatusOr<std::unique_ptr<Literal>> EvaluateOpForLiteralInternal(
             return on_false;
           };
       return ElementWiseTernaryOp<NativeT, bool, NativeT, NativeT>(
-          shape, std::move(select_op), operands[0]->literal(),
-          operands[1]->literal(), operands[2]->literal());
+          shape, std::move(select_op), GetEvaluatedLiteralFor(operands[0]),
+          GetEvaluatedLiteralFor(operands[1]),
+          GetEvaluatedLiteralFor(operands[2]));
     } break;
     default:
       return Unimplemented("unhandled HLO ops for HloEvaluator: %s.",
@@ -320,48 +436,4 @@ StatusOr<std::unique_ptr<Literal>> EvaluateOpForLiteralInternal(
   }
 }
 
-}  // namespace
-
-/* static */ StatusOr<std::unique_ptr<Literal>>
-HloEvaluator::EvaluateOpForLiteral(HloInstruction* instruction) {
-  DCHECK(hlo_query::AllOperandsAreConstants(*instruction));
-
-  Shape shape = instruction->shape();
-  TF_CHECK_OK(ShapeUtil::ValidateShape(shape));
-
-  // REVIEW QUESTION: other than a few operations, do we need to handle the
-  // general case of operands being of different types in the context of the
-  // evaluator?
-
-  switch (shape.element_type()) {
-    case PRED:
-      return EvaluateOpForLiteralInternal<bool>(instruction);
-    case U8:
-      return EvaluateOpForLiteralInternal<uint8>(instruction);
-    case U16:
-      LOG(FATAL) << "U16/uint16 is unimplemented.";
-    case U32:
-      return EvaluateOpForLiteralInternal<uint32>(instruction);
-    case U64:
-      return EvaluateOpForLiteralInternal<uint64>(instruction);
-    case S8:
-      return EvaluateOpForLiteralInternal<int8>(instruction);
-    case S16:
-      LOG(FATAL) << "S16/int16 is unimplemented.";
-    case S32:
-      return EvaluateOpForLiteralInternal<int32>(instruction);
-    case S64:
-      return EvaluateOpForLiteralInternal<int64>(instruction);
-    case F16:
-      LOG(FATAL) << "F16 is unimplemented.";
-    case F32:
-      return EvaluateOpForLiteralInternal<float>(instruction);
-    case F64:
-      return EvaluateOpForLiteralInternal<double>(instruction);
-    default:
-      return Unimplemented("unhandled primitive type: %s.",
-                           PrimitiveType_Name(shape.element_type()).c_str());
-  }
-}
-
 }  // namespace xla
diff --git a/tensorflow/compiler/xla/service/hlo_evaluator.h b/tensorflow/compiler/xla/service/hlo_evaluator.h
index c6ec650d67..6372a6c269 100644
--- a/tensorflow/compiler/xla/service/hlo_evaluator.h
+++ b/tensorflow/compiler/xla/service/hlo_evaluator.h
@@ -18,22 +18,89 @@ limitations under the License.
 
 #include <memory>
 
+#include "tensorflow/compiler/xla/service/dfs_hlo_visitor_with_default.h"
+#include "tensorflow/compiler/xla/service/hlo_computation.h"
 #include "tensorflow/compiler/xla/service/hlo_instruction.h"
 #include "tensorflow/compiler/xla/statusor.h"
+#include "tensorflow/compiler/xla/util.h"
 #include "tensorflow/compiler/xla/xla_data.pb.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/lib/gtl/flatmap.h"
+#include "tensorflow/core/platform/macros.h"
 
 namespace xla {
 
-// Responsible for evaluating a HLO instruction with constant operands.
-class HloEvaluator {
+// Responsible for evaluating HLO and obtain literal as the evaluation results.
+//
+// This class is not thread-safe.
+class HloEvaluator : public DfsHloVisitorWithDefault {
  public:
-  // Evaluates a single HLO instruction for constants and return the result as a
-  // Literal.
-  // Precondition: all operands of the instruction are constants, instruction is
-  // valid with corresponding number of operands for the given operator.
+  HloEvaluator() {}
+  ~HloEvaluator() override {}
+
+  // Evaluates a HLO computation and an array of pointers to literals.
+  // Return the evaluated result as literal if successful.
+  // Precondition: argument literals are in post-order corresponding to the
+  // input instruction's parameters.
+  StatusOr<std::unique_ptr<Literal>> Evaluate(
+      HloComputation* computation,
+      tensorflow::gtl::ArraySlice<const Literal*> arg_literals);
+
+  // Evaluates a single HLO instruction and an array of pointers to literals.
+  // Return the evaluated result as literal if successful.
+  // Precondition:
+  // 1. argument literals are in post-order corresponding to the input
+  // instruction's parameters.
+  // 2. the instruction's operands must be of either Parameter or Constant type.
   // TODO(b/35950897): implement more ops other than element-wise ops.
-  static StatusOr<std::unique_ptr<Literal>> EvaluateOpForLiteral(
+  // TODO(b/35950897): handle broadcasts.
+  StatusOr<std::unique_ptr<Literal>> Evaluate(
+      HloInstruction* instruction,
+      tensorflow::gtl::ArraySlice<const Literal*> arg_literals);
+
+ protected:
+  // The following methods implement the DfsHloVisitor interface.
+  //
+  // DefaultAction here handles all non-specificialized (i.e., instruction
+  // without corresponding Handle* method) instructions.
+  // TODO(b/35950897): it's likely better to refactor the switches here and push
+  // up the switch to templated methods instead, likely at DfsHloVisitor level.
+  Status DefaultAction(HloInstruction* hlo_instruction) override;
+
+  Status HandleParameter(HloInstruction* parameter) override;
+  Status HandleConstant(HloInstruction* constant,
+                        const Literal& literal) override;
+
+ private:
+  // Evaluates a single HLO instruction return the result as a Literal if
+  // successful. A Status will be returned on error.
+  StatusOr<std::unique_ptr<Literal>> EvaluateBasedOnType(
       HloInstruction* instruction);
+
+  // Evaluates an element-wise HLO instruction that has the same output literal
+  // type as the operands' types.
+  template <typename NativeT>
+  StatusOr<std::unique_ptr<Literal>> EvaluateSameTypedElementwise(
+      HloInstruction* instruction);
+
+  // Returns the already-evaluated literal result for the instruction.
+  // Crash with log if the given instruction has not been evaluated previously.
+  const Literal& GetEvaluatedLiteralFor(const HloInstruction* hlo) {
+    auto it = evaluated_.find(hlo);
+    CHECK(it != evaluated_.end())
+        << "could not find evaluated value for: " << hlo->ToString();
+    return *(it->second);
+  }
+
+  // Tracks the HLO instruciton and its evaluated literal result.
+  tensorflow::gtl::FlatMap<const HloInstruction*, std::unique_ptr<Literal>>
+      evaluated_;
+  // Stores input literals, assuming they are in post-order. Literals are not
+  // owned by this class, and they must outlive the lifetime of the instance of
+  // this class.
+  tensorflow::gtl::ArraySlice<const Literal*> arg_literals_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(HloEvaluator);
 };
 
 }  // namespace xla
diff --git a/tensorflow/compiler/xla/service/hlo_evaluator_test.cc b/tensorflow/compiler/xla/service/hlo_evaluator_test.cc
index 585fe65def..443e5ad4f4 100644
--- a/tensorflow/compiler/xla/service/hlo_evaluator_test.cc
+++ b/tensorflow/compiler/xla/service/hlo_evaluator_test.cc
@@ -14,10 +14,13 @@ limitations under the License.
 ==============================================================================*/
 #include "tensorflow/compiler/xla/service/hlo_evaluator.h"
 
+#include <memory>
 #include <string>
 #include <utility>
+#include <vector>
 
 #include "tensorflow/compiler/xla/literal_util.h"
+#include "tensorflow/compiler/xla/service/hlo_computation.h"
 #include "tensorflow/compiler/xla/service/hlo_instruction.h"
 #include "tensorflow/compiler/xla/shape_util.h"
 #include "tensorflow/compiler/xla/statusor.h"
@@ -29,9 +32,16 @@ limitations under the License.
 namespace xla {
 namespace {
 
+class HloEvaluatorTest : public ::testing::Test {
+ protected:
+  HloEvaluatorTest() { evaluator_ = MakeUnique<HloEvaluator>(); }
+
+  std::unique_ptr<HloEvaluator> evaluator_;
+};
+
 // Verifies that HloEvaluator evaluates a HLO instruction that performs clamp
 // with 3 operands.
-TEST(HloEvaluatorTest, DoesClamp) {
+TEST_F(HloEvaluatorTest, DoesClamp) {
   auto low = LiteralUtil::CreateR2<float>({{0.f, 2.f}, {2.f, 4.f}});
   auto high = LiteralUtil::CreateR2<float>({{2.f, 4.f}, {4.f, 4.f}});
   auto value = LiteralUtil::CreateR2<float>({{0.f, 5.f}, {0.f, 4.f}});
@@ -44,7 +54,7 @@ TEST(HloEvaluatorTest, DoesClamp) {
       shape, HloOpcode::kClamp, c1.get(), c2.get(), c3.get());
 
   std::unique_ptr<Literal> result =
-      HloEvaluator::EvaluateOpForLiteral(instruction.get()).ConsumeValueOrDie();
+      evaluator_->Evaluate(instruction.get(), {}).ConsumeValueOrDie();
 
   auto expected = LiteralUtil::CreateR2<float>({{0, 4}, {2, 4}});
 
@@ -53,7 +63,7 @@ TEST(HloEvaluatorTest, DoesClamp) {
 
 // Verifies that HloEvaluator evaluates a HLO instruction that performs select
 // with 3 operands.
-TEST(HloEvaluatorTest, DoesSelect) {
+TEST_F(HloEvaluatorTest, DoesSelect) {
   auto pred = LiteralUtil::CreateR2<bool>({{true, false}, {false, true}});
   auto on_true = LiteralUtil::CreateR2<float>({{2.f, 4.f}, {4.f, 4.f}});
   auto on_false = LiteralUtil::CreateR2<float>({{0.f, 5.f}, {0.f, 4.f}});
@@ -66,7 +76,7 @@ TEST(HloEvaluatorTest, DoesSelect) {
       shape, HloOpcode::kSelect, c1.get(), c2.get(), c3.get());
 
   std::unique_ptr<Literal> result =
-      HloEvaluator::EvaluateOpForLiteral(instruction.get()).ConsumeValueOrDie();
+      evaluator_->Evaluate(instruction.get(), {}).ConsumeValueOrDie();
 
   auto expected = LiteralUtil::CreateR2<float>({{2, 5}, {0, 4}});
 
@@ -75,7 +85,7 @@ TEST(HloEvaluatorTest, DoesSelect) {
 
 // Verifies that HloEvaluator evaluates a HLO instruction that performs
 // element-wise addition with 2 operands.
-TEST(HloEvaluatorTest, DoesAdd) {
+TEST_F(HloEvaluatorTest, DoesAdd) {
   auto lhs = LiteralUtil::CreateR2<int64>({{1, 0}, {-100, 4}});
   auto rhs = LiteralUtil::CreateR2<int64>({{2, 4}, {4, 4}});
 
@@ -86,7 +96,7 @@ TEST(HloEvaluatorTest, DoesAdd) {
       HloInstruction::CreateBinary(shape, HloOpcode::kAdd, c1.get(), c2.get());
 
   std::unique_ptr<Literal> result =
-      HloEvaluator::EvaluateOpForLiteral(instruction.get()).ConsumeValueOrDie();
+      evaluator_->Evaluate(instruction.get(), {}).ConsumeValueOrDie();
 
   auto expected = LiteralUtil::CreateR2<int64>({{3, 4}, {-96, 8}});
 
@@ -95,7 +105,7 @@ TEST(HloEvaluatorTest, DoesAdd) {
 
 // Verifies that HloEvaluator evaluates a HLO instruction that performs
 // element-wise divide with 2 operands.
-TEST(HloEvaluatorTest, DoesDivide) {
+TEST_F(HloEvaluatorTest, DoesDivide) {
   auto lhs_s64 = LiteralUtil::CreateR2<int64>({{1, 0}, {-100, 4}});
   auto rhs_s64 = LiteralUtil::CreateR2<int64>({{2, 4}, {4, 4}});
 
@@ -106,7 +116,7 @@ TEST(HloEvaluatorTest, DoesDivide) {
                                                   c1_s64.get(), c2_s64.get());
 
   std::unique_ptr<Literal> result =
-      HloEvaluator::EvaluateOpForLiteral(instruction.get()).ConsumeValueOrDie();
+      evaluator_->Evaluate(instruction.get(), {}).ConsumeValueOrDie();
 
   auto expected = LiteralUtil::CreateR2<int64>({{0, 0}, {-25, 1}});
 
@@ -121,8 +131,7 @@ TEST(HloEvaluatorTest, DoesDivide) {
   instruction = HloInstruction::CreateBinary(shape_f64, HloOpcode::kDivide,
                                              c1_f64.get(), c2_f64.get());
 
-  result =
-      HloEvaluator::EvaluateOpForLiteral(instruction.get()).ConsumeValueOrDie();
+  result = evaluator_->Evaluate(instruction.get(), {}).ConsumeValueOrDie();
 
   expected =
       LiteralUtil::CreateR2<double>({{0.45454545454545453, 0}, {-25, 1}});
@@ -132,21 +141,51 @@ TEST(HloEvaluatorTest, DoesDivide) {
 
 // Verifies that HloEvaluator evaluates a HLO instruction that performs
 // element-wise abs op with 1 operand.
-TEST(HloEvaluatorTest, DoesAbs) {
+TEST_F(HloEvaluatorTest, DoesAbs) {
   auto operand = LiteralUtil::CreateR2<int64>({{1, -20}, {-100, 4}});
-
   Shape shape = ShapeUtil::MakeShape(S64, {2, 2});
   auto c1 = HloInstruction::CreateConstant(std::move(operand));
   auto instruction =
       HloInstruction::CreateUnary(shape, HloOpcode::kAbs, c1.get());
 
   std::unique_ptr<Literal> result =
-      HloEvaluator::EvaluateOpForLiteral(instruction.get()).ConsumeValueOrDie();
+      evaluator_->Evaluate(instruction.get(), {}).ConsumeValueOrDie();
 
   auto expected = LiteralUtil::CreateR2<int64>({{1, 20}, {100, 4}});
 
   EXPECT_TRUE(LiteralUtil::Equal(*result, *expected));
 }
 
+// Verifies that HloEvaluator evaluates a HLO Computation with non-parameter nor
+// constant operands.
+TEST_F(HloEvaluatorTest, DoesTraveseInstructions) {
+  HloComputation::Builder builder(
+      ::testing::UnitTest::GetInstance()->current_test_info()->name());
+
+  auto lhs = LiteralUtil::CreateR2<int64>({{1, 0}, {-100, 4}});
+  auto rhs = LiteralUtil::CreateR2<int64>({{2, 4}, {4, 4}});
+  auto rhs2 = LiteralUtil::CreateR2<int64>({{1, -20}, {-100, 4}});
+  std::vector<const Literal*> args = {lhs.get(), rhs.get(), rhs2.get()};
+
+  Shape shape = ShapeUtil::MakeShape(S64, {2, 2});
+
+  auto param_lhs = HloInstruction::CreateParameter(0, shape, "lhs");
+  auto param_rhs = HloInstruction::CreateParameter(1, shape, "rhs");
+  auto lhs_instruction = HloInstruction::CreateBinary(
+      shape, HloOpcode::kAdd, param_lhs.get(), param_rhs.get());
+
+  auto param_rhs2 = HloInstruction::CreateParameter(2, shape, "rhs2");
+  auto root_instruction = HloInstruction::CreateBinary(
+      shape, HloOpcode::kAdd, lhs_instruction.get(), param_rhs2.get());
+
+  builder.AddInstruction(std::move(root_instruction));
+  std::unique_ptr<Literal> result =
+      evaluator_->Evaluate(builder.Build().get(), args).ConsumeValueOrDie();
+
+  auto expected = LiteralUtil::CreateR2<int64>({{4, -16}, {-196, 12}});
+
+  EXPECT_TRUE(LiteralUtil::Equal(*result, *expected));
+}
+
 }  // namespace
 }  // namespace xla
diff --git a/tensorflow/compiler/xla/service/hlo_query.cc b/tensorflow/compiler/xla/service/hlo_query.cc
index d699737864..a153d73dbd 100644
--- a/tensorflow/compiler/xla/service/hlo_query.cc
+++ b/tensorflow/compiler/xla/service/hlo_query.cc
@@ -32,6 +32,16 @@ bool IsConstantR0F32(HloInstruction* instruction, float* out) {
   return false;
 }
 
+bool AllOperandsAreParametersOrConstants(const HloInstruction& instruction) {
+  for (const auto& operand : instruction.operands()) {
+    if (operand->opcode() != HloOpcode::kParameter &&
+        operand->opcode() != HloOpcode::kConstant) {
+      return false;
+    }
+  }
+  return true;
+}
+
 bool AllOperandsAreParameters(const HloInstruction& instruction) {
   for (const auto& operand : instruction.operands()) {
     if (operand->opcode() != HloOpcode::kParameter) {
diff --git a/tensorflow/compiler/xla/service/hlo_query.h b/tensorflow/compiler/xla/service/hlo_query.h
index 56f3cfd863..c79347bbf9 100644
--- a/tensorflow/compiler/xla/service/hlo_query.h
+++ b/tensorflow/compiler/xla/service/hlo_query.h
@@ -28,6 +28,10 @@ namespace hlo_query {
 // Precondition: out != nullptr
 bool IsConstantR0F32(HloInstruction* instruction, float* out);
 
+// Returns whether all of an instruction's operands are of the types constants
+// and parameters.
+bool AllOperandsAreParametersOrConstants(const HloInstruction& instruction);
+
 // Returns whether all of an instruction's operands are parameters.
 bool AllOperandsAreParameters(const HloInstruction& instruction);
 
diff --git a/tensorflow/compiler/xla/service/hlo_tfgraph_builder.cc b/tensorflow/compiler/xla/service/hlo_tfgraph_builder.cc
index fdc1c0ba2d..da07dea123 100644
--- a/tensorflow/compiler/xla/service/hlo_tfgraph_builder.cc
+++ b/tensorflow/compiler/xla/service/hlo_tfgraph_builder.cc
@@ -88,12 +88,18 @@ const string& HloTfGraphBuilder::GetNodeNameForInstruction(
   if (ContainsKey(instruction_to_node_name_, instruction)) {
     return instruction_to_node_name_[instruction];
   }
+  string node_name;
   // If an instruction is fused, put it in the subgraph of the fusion;
   // otherwise, put it in the computation subgraph.
-  string node_name =
-      instruction->IsFused()
-          ? GetNodeNameForInstruction(instruction->fusion_instruction())
-          : instruction->parent()->name();
+  if (instruction->IsFused()) {
+    node_name = GetNodeNameForInstruction(instruction->fusion_instruction());
+  } else {
+    node_name = instruction->parent()->name();
+    if (!instruction->metadata().op_name().empty()) {
+      // Always make computations contain TF ops but not the other way around.
+      StrAppend(&node_name, "/", instruction->metadata().op_name());
+    }
+  }
   string instruction_name = instruction->name();
   if (instruction->opcode() == HloOpcode::kParameter) {
     StrAppend(&instruction_name, ".", instruction->parameter_number());
diff --git a/tensorflow/compiler/xla/service/hlo_tfgraph_builder_test.cc b/tensorflow/compiler/xla/service/hlo_tfgraph_builder_test.cc
index df66408022..6041debc4a 100644
--- a/tensorflow/compiler/xla/service/hlo_tfgraph_builder_test.cc
+++ b/tensorflow/compiler/xla/service/hlo_tfgraph_builder_test.cc
@@ -137,6 +137,28 @@ TEST_F(HloTfGraphBuilderTest, GreaterThanOrEqualTo) {
   EXPECT_EQ(graph_def.node(2).op(), "HloGreaterThanOrEqualTo");
 }
 
+TEST_F(HloTfGraphBuilderTest, IncorparateTfOpsStructure) {
+  auto builder = HloComputation::Builder("GE");
+  auto param_1 = builder.AddInstruction(
+      HloInstruction::CreateParameter(0, r0f32_, "param0"));
+  auto param_2 = builder.AddInstruction(
+      HloInstruction::CreateParameter(1, r0f32_, "param1"));
+  auto ge = builder.AddInstruction(
+      HloInstruction::CreateBinary(r0f32_, HloOpcode::kGe, param_1, param_2));
+  OpMetadata metadata;
+  metadata.set_op_name("x/y");
+  metadata.set_op_type("Y");
+  ge->set_metadata(metadata);
+  TF_CHECK_OK(generator_.AddComputation(*builder.Build()));
+  GraphDef graph_def = generator_.GetGraphDef();
+  EXPECT_EQ(graph_def.node_size(), 3);
+  EXPECT_EQ(graph_def.node(0).name(), "GE/param0.0");
+  EXPECT_EQ(graph_def.node(1).name(), "GE/param1.1");
+  EXPECT_EQ(graph_def.node(2).input_size(), 2);
+  EXPECT_EQ(graph_def.node(2).name(), "GE/x/y/greater-than-or-equal-to");
+  EXPECT_EQ(graph_def.node(2).op(), "HloGreaterThanOrEqualTo");
+}
+
 TEST_F(HloTfGraphBuilderTest, EmbeddedComputationsDiamond) {
   // Create computations with a diamond-shaped callgraph.
   auto negate_computation = CreateNegateComputation();
diff --git a/tensorflow/contrib/distributions/BUILD b/tensorflow/contrib/distributions/BUILD
index 1b9bd6ad91..9f675c6613 100644
--- a/tensorflow/contrib/distributions/BUILD
+++ b/tensorflow/contrib/distributions/BUILD
@@ -193,38 +193,6 @@ cuda_py_test(
     tags = ["notap"],  # http://b/30441813
 )
 
-cuda_py_test(
-    name = "bernoulli_test",
-    size = "small",
-    srcs = ["python/kernel_tests/bernoulli_test.py"],
-    additional_deps = [
-        ":distributions_py",
-        "//third_party/py/numpy",
-        "//tensorflow/python:array_ops",
-        "//tensorflow/python:client_testlib",
-        "//tensorflow/python:framework_for_generated_wrappers",
-        "//tensorflow/python:math_ops",
-        "//tensorflow/python:platform_test",
-    ],
-)
-
-cuda_py_test(
-    name = "beta_test",
-    size = "small",
-    srcs = ["python/kernel_tests/beta_test.py"],
-    additional_deps = [
-        ":distributions_py",
-        "//third_party/py/numpy",
-        "//tensorflow/python:client",
-        "//tensorflow/python:client_testlib",
-        "//tensorflow/python:framework",
-        "//tensorflow/python:framework_for_generated_wrappers",
-        "//tensorflow/python:math_ops",
-        "//tensorflow/python:nn_ops",
-        "//tensorflow/python:platform_test",
-    ],
-)
-
 cuda_py_test(
     name = "binomial_test",
     size = "small",
@@ -238,24 +206,6 @@ cuda_py_test(
     ],
 )
 
-cuda_py_test(
-    name = "categorical_test",
-    size = "small",
-    srcs = ["python/kernel_tests/categorical_test.py"],
-    additional_deps = [
-        ":distributions_py",
-        "//third_party/py/numpy",
-        "//tensorflow/python:array_ops",
-        "//tensorflow/python:client_testlib",
-        "//tensorflow/python:framework",
-        "//tensorflow/python:framework_for_generated_wrappers",
-        "//tensorflow/python:framework_test_lib",
-        "//tensorflow/python:math_ops",
-        "//tensorflow/python:platform_test",
-        "//tensorflow/python:random_ops",
-    ],
-)
-
 cuda_py_test(
     name = "chi2_test",
     srcs = ["python/kernel_tests/chi2_test.py"],
@@ -287,66 +237,6 @@ cuda_py_test(
     ],
 )
 
-cuda_py_test(
-    name = "dirichlet_test",
-    size = "small",
-    srcs = ["python/kernel_tests/dirichlet_test.py"],
-    additional_deps = [
-        ":distributions_py",
-        "//third_party/py/numpy",
-        "//tensorflow/python:client_testlib",
-        "//tensorflow/python:framework_for_generated_wrappers",
-        "//tensorflow/python:framework_test_lib",
-        "//tensorflow/python:platform_test",
-    ],
-)
-
-cuda_py_test(
-    name = "dirichlet_multinomial_test",
-    size = "medium",
-    srcs = ["python/kernel_tests/dirichlet_multinomial_test.py"],
-    additional_deps = [
-        ":distributions_py",
-        "//third_party/py/numpy",
-        "//tensorflow/python:array_ops",
-        "//tensorflow/python:client_testlib",
-        "//tensorflow/python:framework_for_generated_wrappers",
-        "//tensorflow/python:framework_test_lib",
-        "//tensorflow/python:math_ops",
-        "//tensorflow/python:platform_test",
-    ],
-)
-
-cuda_py_test(
-    name = "exponential_test",
-    srcs = ["python/kernel_tests/exponential_test.py"],
-    additional_deps = [
-        ":distributions_py",
-        "//third_party/py/numpy",
-        "//tensorflow/python:client",
-        "//tensorflow/python:client_testlib",
-        "//tensorflow/python:framework_for_generated_wrappers",
-        "//tensorflow/python:framework_test_lib",
-        "//tensorflow/python:nn_ops",
-        "//tensorflow/python:platform_test",
-    ],
-)
-
-cuda_py_test(
-    name = "gamma_test",
-    srcs = ["python/kernel_tests/gamma_test.py"],
-    additional_deps = [
-        ":distributions_py",
-        "//third_party/py/numpy",
-        "//tensorflow/python:client",
-        "//tensorflow/python:client_testlib",
-        "//tensorflow/python:framework_for_generated_wrappers",
-        "//tensorflow/python:framework_test_lib",
-        "//tensorflow/python:nn_ops",
-        "//tensorflow/python:platform_test",
-    ],
-)
-
 cuda_py_test(
     name = "geometric_test",
     size = "small",
@@ -378,36 +268,6 @@ cuda_py_test(
     ],
 )
 
-cuda_py_test(
-    name = "laplace_test",
-    srcs = ["python/kernel_tests/laplace_test.py"],
-    additional_deps = [
-        ":distributions_py",
-        "//third_party/py/numpy",
-        "//tensorflow/python:client",
-        "//tensorflow/python:client_testlib",
-        "//tensorflow/python:framework_for_generated_wrappers",
-        "//tensorflow/python:framework_test_lib",
-        "//tensorflow/python:nn_ops",
-        "//tensorflow/python:platform_test",
-    ],
-)
-
-cuda_py_test(
-    name = "multinomial_test",
-    srcs = ["python/kernel_tests/multinomial_test.py"],
-    additional_deps = [
-        ":distributions_py",
-        "//third_party/py/numpy",
-        "//tensorflow/python:array_ops",
-        "//tensorflow/python:client_testlib",
-        "//tensorflow/python:framework_for_generated_wrappers",
-        "//tensorflow/python:framework_test_lib",
-        "//tensorflow/python:math_ops",
-        "//tensorflow/python:platform_test",
-    ],
-)
-
 cuda_py_test(
     name = "mvn_diag_test",
     size = "small",
@@ -528,24 +388,6 @@ cuda_py_test(
     tags = ["nomsan"],  # disable to avoid false positives from scipy.
 )
 
-cuda_py_test(
-    name = "student_t_test",
-    size = "small",
-    srcs = ["python/kernel_tests/student_t_test.py"],
-    additional_deps = [
-        ":distributions_py",
-        "//third_party/py/numpy",
-        "//tensorflow/python:client_testlib",
-        "//tensorflow/python:framework",
-        "//tensorflow/python:framework_for_generated_wrappers",
-        "//tensorflow/python:framework_test_lib",
-        "//tensorflow/python:math_ops",
-        "//tensorflow/python:nn_ops",
-        "//tensorflow/python:platform_test",
-    ],
-    tags = ["nomsan"],  # disable to avoid false positives from scipy.
-)
-
 cuda_py_test(
     name = "vector_student_t_test",
     size = "medium",
@@ -562,22 +404,6 @@ cuda_py_test(
     ],
 )
 
-cuda_py_test(
-    name = "uniform_test",
-    size = "small",
-    srcs = ["python/kernel_tests/uniform_test.py"],
-    additional_deps = [
-        ":distributions_py",
-        "//third_party/py/numpy",
-        "//tensorflow/python:array_ops",
-        "//tensorflow/python:client_testlib",
-        "//tensorflow/python:errors",
-        "//tensorflow/python:framework_for_generated_wrappers",
-        "//tensorflow/python:framework_test_lib",
-        "//tensorflow/python:math_ops",
-    ],
-)
-
 cuda_py_test(
     name = "wishart_test",
     size = "small",
diff --git a/tensorflow/contrib/distributions/__init__.py b/tensorflow/contrib/distributions/__init__.py
index 15e33c2c6f..6ea74fab0e 100644
--- a/tensorflow/contrib/distributions/__init__.py
+++ b/tensorflow/contrib/distributions/__init__.py
@@ -15,74 +15,6 @@
 """Classes representing statistical distributions and ops for working with them.
 
 See the @{$python/contrib.distributions} guide.
-
-## Distribution Object
-@@ReparameterizationType
-@@Distribution
-
-## Individual Distributions
-@@Binomial
-@@Bernoulli
-@@BernoulliWithSigmoidProbs
-@@Beta
-@@BetaWithSoftplusConcentration
-@@Categorical
-@@Chi2
-@@Chi2WithAbsDf
-@@Deterministic
-@@VectorDeterministic
-@@Exponential
-@@ExponentialWithSoftplusRate
-@@Gamma
-@@GammaWithSoftplusConcentrationRate
-@@Geometric
-@@InverseGamma
-@@InverseGammaWithSoftplusConcentrationRate
-@@Laplace
-@@LaplaceWithSoftplusScale
-@@Logistic
-@@NegativeBinomial
-@@Normal
-@@NormalWithSoftplusScale
-@@Poisson
-@@StudentT
-@@StudentTWithAbsDfSoftplusScale
-@@Uniform
-
-@@MultivariateNormalDiag
-@@MultivariateNormalTriL
-@@MultivariateNormalDiagPlusLowRank
-@@MultivariateNormalDiagWithSoftplusScale
-
-@@Dirichlet
-@@DirichletMultinomial
-@@Multinomial
-@@WishartCholesky
-@@WishartFull
-
-@@TransformedDistribution
-@@QuantizedDistribution
-
-@@Mixture
-
-@@ExpRelaxedOneHotCategorical
-@@OneHotCategorical
-@@RelaxedBernoulli
-@@RelaxedOneHotCategorical
-
-## Kullback-Leibler Divergence
-@@kl_divergence
-@@RegisterKL
-
-## Helper Functions
-@@matrix_diag_transform
-@@normal_conjugates_known_scale_posterior
-@@normal_conjugates_known_scale_predictive
-@@softplus_inverse
-
-## Functions for statistics of samples
-@@percentile
-
 """
 from __future__ import absolute_import
 from __future__ import division
@@ -91,25 +23,16 @@ from __future__ import print_function
 # pylint: disable=unused-import,wildcard-import,line-too-long,g-importing-member
 
 from tensorflow.contrib.distributions.python.ops import bijectors
-from tensorflow.contrib.distributions.python.ops.bernoulli import *
-from tensorflow.contrib.distributions.python.ops.beta import *
 from tensorflow.contrib.distributions.python.ops.binomial import *
-from tensorflow.contrib.distributions.python.ops.categorical import *
 from tensorflow.contrib.distributions.python.ops.chi2 import *
 from tensorflow.contrib.distributions.python.ops.conditional_transformed_distribution import *
 from tensorflow.contrib.distributions.python.ops.deterministic import *
-from tensorflow.contrib.distributions.python.ops.dirichlet import *
-from tensorflow.contrib.distributions.python.ops.dirichlet_multinomial import *
 from tensorflow.contrib.distributions.python.ops.distribution_util import matrix_diag_transform
 from tensorflow.contrib.distributions.python.ops.distribution_util import softplus_inverse
-from tensorflow.contrib.distributions.python.ops.exponential import *
-from tensorflow.contrib.distributions.python.ops.gamma import *
 from tensorflow.contrib.distributions.python.ops.geometric import *
 from tensorflow.contrib.distributions.python.ops.inverse_gamma import *
-from tensorflow.contrib.distributions.python.ops.laplace import *
 from tensorflow.contrib.distributions.python.ops.logistic import *
 from tensorflow.contrib.distributions.python.ops.mixture import *
-from tensorflow.contrib.distributions.python.ops.multinomial import *
 from tensorflow.contrib.distributions.python.ops.mvn_diag import *
 from tensorflow.contrib.distributions.python.ops.mvn_diag_plus_low_rank import *
 from tensorflow.contrib.distributions.python.ops.mvn_tril import *
@@ -121,14 +44,23 @@ from tensorflow.contrib.distributions.python.ops.quantized_distribution import *
 from tensorflow.contrib.distributions.python.ops.relaxed_bernoulli import *
 from tensorflow.contrib.distributions.python.ops.relaxed_onehot_categorical import *
 from tensorflow.contrib.distributions.python.ops.sample_stats import *
-from tensorflow.contrib.distributions.python.ops.student_t import *
 from tensorflow.contrib.distributions.python.ops.transformed_distribution import *
-from tensorflow.contrib.distributions.python.ops.uniform import *
 from tensorflow.contrib.distributions.python.ops.wishart import *
+from tensorflow.python.ops.distributions.bernoulli import *
+from tensorflow.python.ops.distributions.beta import *
+from tensorflow.python.ops.distributions.categorical import *
 from tensorflow.python.ops.distributions.conditional_distribution import *
+from tensorflow.python.ops.distributions.dirichlet import *
+from tensorflow.python.ops.distributions.dirichlet_multinomial import *
 from tensorflow.python.ops.distributions.distribution import *
+from tensorflow.python.ops.distributions.exponential import *
+from tensorflow.python.ops.distributions.gamma import *
 from tensorflow.python.ops.distributions.kullback_leibler import *
+from tensorflow.python.ops.distributions.laplace import *
+from tensorflow.python.ops.distributions.multinomial import *
 from tensorflow.python.ops.distributions.normal import *
+from tensorflow.python.ops.distributions.student_t import *
+from tensorflow.python.ops.distributions.uniform import *
 
 # pylint: enable=unused-import,wildcard-import,line-too-long,g-importing-member
 
@@ -140,6 +72,71 @@ _allowed_symbols = [
     'ConditionalTransformedDistribution',
     'FULLY_REPARAMETERIZED',
     'NOT_REPARAMETERIZED',
+    'Affine',
+    'AffineLinearOperator',
+    'Bijector',
+    'Chain',
+    'CholeskyOuterProduct',
+    'Exp',
+    'Identity',
+    'Inline',
+    'Invert',
+    'PowerTransform',
+    'SigmoidCentered',
+    'SoftmaxCentered',
+    'Softplus',
+    'ReparameterizationType',
+    'Distribution',
+    'Binomial',
+    'Bernoulli',
+    'BernoulliWithSigmoidProbs',
+    'Beta',
+    'BetaWithSoftplusConcentration',
+    'Categorical',
+    'Chi2',
+    'Chi2WithAbsDf',
+    'Deterministic',
+    'VectorDeterministic',
+    'Exponential',
+    'ExponentialWithSoftplusRate',
+    'Gamma',
+    'GammaWithSoftplusConcentrationRate',
+    'Geometric',
+    'InverseGamma',
+    'InverseGammaWithSoftplusConcentrationRate',
+    'Laplace',
+    'LaplaceWithSoftplusScale',
+    'Logistic',
+    'NegativeBinomial',
+    'Normal',
+    'NormalWithSoftplusScale',
+    'Poisson',
+    'StudentT',
+    'StudentTWithAbsDfSoftplusScale',
+    'Uniform',
+    'MultivariateNormalDiag',
+    'MultivariateNormalTriL',
+    'MultivariateNormalDiagPlusLowRank',
+    'MultivariateNormalDiagWithSoftplusScale',
+    'Dirichlet',
+    'DirichletMultinomial',
+    'Multinomial',
+    'WishartCholesky',
+    'WishartFull',
+    'TransformedDistribution',
+    'QuantizedDistribution',
+    'Mixture',
+    'ExpRelaxedOneHotCategorical',
+    'OneHotCategorical',
+    'RelaxedBernoulli',
+    'RelaxedOneHotCategorical',
+    'kl_divergence',
+    'RegisterKL',
+    'matrix_diag_transform',
+    'normal_conjugates_known_scale_posterior',
+    'normal_conjugates_known_scale_predictive',
+    'softplus_inverse',
+    'percentile'
 ]
 
 remove_undocumented(__name__, _allowed_symbols)
diff --git a/tensorflow/contrib/distributions/python/kernel_tests/bernoulli_test.py b/tensorflow/contrib/distributions/python/kernel_tests/bernoulli_test.py
deleted file mode 100644
index e8b0eb4eb8..0000000000
--- a/tensorflow/contrib/distributions/python/kernel_tests/bernoulli_test.py
+++ /dev/null
@@ -1,300 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for the Bernoulli distribution."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-import scipy.special
-from tensorflow.contrib.distributions.python.ops import bernoulli
-from tensorflow.python.framework import constant_op
-from tensorflow.python.framework import dtypes
-from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import math_ops
-from tensorflow.python.ops.distributions import kullback_leibler
-from tensorflow.python.platform import test
-
-
-def make_bernoulli(batch_shape, dtype=dtypes.int32):
-  p = np.random.uniform(size=list(batch_shape))
-  p = constant_op.constant(p, dtype=dtypes.float32)
-  return bernoulli.Bernoulli(probs=p, dtype=dtype)
-
-
-def entropy(p):
-  q = 1. - p
-  return -q * np.log(q) - p * np.log(p)
-
-
-class BernoulliTest(test.TestCase):
-
-  def testP(self):
-    p = [0.2, 0.4]
-    dist = bernoulli.Bernoulli(probs=p)
-    with self.test_session():
-      self.assertAllClose(p, dist.probs.eval())
-
-  def testLogits(self):
-    logits = [-42., 42.]
-    dist = bernoulli.Bernoulli(logits=logits)
-    with self.test_session():
-      self.assertAllClose(logits, dist.logits.eval())
-
-    with self.test_session():
-      self.assertAllClose(scipy.special.expit(logits), dist.probs.eval())
-
-    p = [0.01, 0.99, 0.42]
-    dist = bernoulli.Bernoulli(probs=p)
-    with self.test_session():
-      self.assertAllClose(scipy.special.logit(p), dist.logits.eval())
-
-  def testInvalidP(self):
-    invalid_ps = [1.01, 2.]
-    for p in invalid_ps:
-      with self.test_session():
-        with self.assertRaisesOpError("probs has components greater than 1"):
-          dist = bernoulli.Bernoulli(probs=p, validate_args=True)
-          dist.probs.eval()
-
-    invalid_ps = [-0.01, -3.]
-    for p in invalid_ps:
-      with self.test_session():
-        with self.assertRaisesOpError("Condition x >= 0"):
-          dist = bernoulli.Bernoulli(probs=p, validate_args=True)
-          dist.probs.eval()
-
-    valid_ps = [0.0, 0.5, 1.0]
-    for p in valid_ps:
-      with self.test_session():
-        dist = bernoulli.Bernoulli(probs=p)
-        self.assertEqual(p, dist.probs.eval())  # Should not fail
-
-  def testShapes(self):
-    with self.test_session():
-      for batch_shape in ([], [1], [2, 3, 4]):
-        dist = make_bernoulli(batch_shape)
-        self.assertAllEqual(batch_shape, dist.batch_shape.as_list())
-        self.assertAllEqual(batch_shape, dist.batch_shape_tensor().eval())
-        self.assertAllEqual([], dist.event_shape.as_list())
-        self.assertAllEqual([], dist.event_shape_tensor().eval())
-
-  def testDtype(self):
-    dist = make_bernoulli([])
-    self.assertEqual(dist.dtype, dtypes.int32)
-    self.assertEqual(dist.dtype, dist.sample(5).dtype)
-    self.assertEqual(dist.dtype, dist.mode().dtype)
-    self.assertEqual(dist.probs.dtype, dist.mean().dtype)
-    self.assertEqual(dist.probs.dtype, dist.variance().dtype)
-    self.assertEqual(dist.probs.dtype, dist.stddev().dtype)
-    self.assertEqual(dist.probs.dtype, dist.entropy().dtype)
-    self.assertEqual(dist.probs.dtype, dist.prob(0).dtype)
-    self.assertEqual(dist.probs.dtype, dist.log_prob(0).dtype)
-
-    dist64 = make_bernoulli([], dtypes.int64)
-    self.assertEqual(dist64.dtype, dtypes.int64)
-    self.assertEqual(dist64.dtype, dist64.sample(5).dtype)
-    self.assertEqual(dist64.dtype, dist64.mode().dtype)
-
-  def _testPmf(self, **kwargs):
-    dist = bernoulli.Bernoulli(**kwargs)
-    with self.test_session():
-      # pylint: disable=bad-continuation
-      xs = [
-          0,
-          [1],
-          [1, 0],
-          [[1, 0]],
-          [[1, 0], [1, 1]],
-      ]
-      expected_pmfs = [
-          [[0.8, 0.6], [0.7, 0.4]],
-          [[0.2, 0.4], [0.3, 0.6]],
-          [[0.2, 0.6], [0.3, 0.4]],
-          [[0.2, 0.6], [0.3, 0.4]],
-          [[0.2, 0.6], [0.3, 0.6]],
-      ]
-      # pylint: enable=bad-continuation
-
-      for x, expected_pmf in zip(xs, expected_pmfs):
-        self.assertAllClose(dist.prob(x).eval(), expected_pmf)
-        self.assertAllClose(dist.log_prob(x).eval(), np.log(expected_pmf))
-
-  def testPmfCorrectBroadcastDynamicShape(self):
-    with self.test_session():
-      p = array_ops.placeholder(dtype=dtypes.float32)
-      dist = bernoulli.Bernoulli(probs=p)
-      event1 = [1, 0, 1]
-      event2 = [[1, 0, 1]]
-      self.assertAllClose(
-          dist.prob(event1).eval({
-              p: [0.2, 0.3, 0.4]
-          }), [0.2, 0.7, 0.4])
-      self.assertAllClose(
-          dist.prob(event2).eval({
-              p: [0.2, 0.3, 0.4]
-          }), [[0.2, 0.7, 0.4]])
-
-  def testPmfInvalid(self):
-    p = [0.1, 0.2, 0.7]
-    with self.test_session():
-      dist = bernoulli.Bernoulli(probs=p, validate_args=True)
-      with self.assertRaisesOpError("must be non-negative."):
-        dist.prob([1, 1, -1]).eval()
-      with self.assertRaisesOpError("is not less than or equal to 1."):
-        dist.prob([2, 0, 1]).eval()
-
-  def testPmfWithP(self):
-    p = [[0.2, 0.4], [0.3, 0.6]]
-    self._testPmf(probs=p)
-    self._testPmf(logits=scipy.special.logit(p))
-
-  def testBroadcasting(self):
-    with self.test_session():
-      p = array_ops.placeholder(dtypes.float32)
-      dist = bernoulli.Bernoulli(probs=p)
-      self.assertAllClose(np.log(0.5), dist.log_prob(1).eval({p: 0.5}))
-      self.assertAllClose(
-          np.log([0.5, 0.5, 0.5]), dist.log_prob([1, 1, 1]).eval({
-              p: 0.5
-          }))
-      self.assertAllClose(
-          np.log([0.5, 0.5, 0.5]), dist.log_prob(1).eval({
-              p: [0.5, 0.5, 0.5]
-          }))
-
-  def testPmfShapes(self):
-    with self.test_session():
-      p = array_ops.placeholder(dtypes.float32, shape=[None, 1])
-      dist = bernoulli.Bernoulli(probs=p)
-      self.assertEqual(2, len(dist.log_prob(1).eval({p: [[0.5], [0.5]]}).shape))
-
-    with self.test_session():
-      dist = bernoulli.Bernoulli(probs=0.5)
-      self.assertEqual(2, len(dist.log_prob([[1], [1]]).eval().shape))
-
-    with self.test_session():
-      dist = bernoulli.Bernoulli(probs=0.5)
-      self.assertEqual((), dist.log_prob(1).get_shape())
-      self.assertEqual((1), dist.log_prob([1]).get_shape())
-      self.assertEqual((2, 1), dist.log_prob([[1], [1]]).get_shape())
-
-    with self.test_session():
-      dist = bernoulli.Bernoulli(probs=[[0.5], [0.5]])
-      self.assertEqual((2, 1), dist.log_prob(1).get_shape())
-
-  def testBoundaryConditions(self):
-    with self.test_session():
-      dist = bernoulli.Bernoulli(probs=1.0)
-      self.assertAllClose(np.nan, dist.log_prob(0).eval())
-      self.assertAllClose([np.nan], [dist.log_prob(1).eval()])
-
-  def testEntropyNoBatch(self):
-    p = 0.2
-    dist = bernoulli.Bernoulli(probs=p)
-    with self.test_session():
-      self.assertAllClose(dist.entropy().eval(), entropy(p))
-
-  def testEntropyWithBatch(self):
-    p = [[0.1, 0.7], [0.2, 0.6]]
-    dist = bernoulli.Bernoulli(probs=p, validate_args=False)
-    with self.test_session():
-      self.assertAllClose(dist.entropy().eval(), [[entropy(0.1), entropy(0.7)],
-                                                  [entropy(0.2), entropy(0.6)]])
-
-  def testSampleN(self):
-    with self.test_session():
-      p = [0.2, 0.6]
-      dist = bernoulli.Bernoulli(probs=p)
-      n = 100000
-      samples = dist.sample(n)
-      samples.set_shape([n, 2])
-      self.assertEqual(samples.dtype, dtypes.int32)
-      sample_values = samples.eval()
-      self.assertTrue(np.all(sample_values >= 0))
-      self.assertTrue(np.all(sample_values <= 1))
-      # Note that the standard error for the sample mean is ~ sqrt(p * (1 - p) /
-      # n). This means that the tolerance is very sensitive to the value of p
-      # as well as n.
-      self.assertAllClose(p, np.mean(sample_values, axis=0), atol=1e-2)
-      self.assertEqual(set([0, 1]), set(sample_values.flatten()))
-      # In this test we're just interested in verifying there isn't a crash
-      # owing to mismatched types. b/30940152
-      dist = bernoulli.Bernoulli(np.log([.2, .4]))
-      self.assertAllEqual((1, 2), dist.sample(1, seed=42).get_shape().as_list())
-
-  def testSampleActsLikeSampleN(self):
-    with self.test_session() as sess:
-      p = [0.2, 0.6]
-      dist = bernoulli.Bernoulli(probs=p)
-      n = 1000
-      seed = 42
-      self.assertAllEqual(
-          dist.sample(n, seed).eval(), dist.sample(n, seed).eval())
-      n = array_ops.placeholder(dtypes.int32)
-      sample, sample = sess.run([dist.sample(n, seed), dist.sample(n, seed)],
-                                feed_dict={n: 1000})
-      self.assertAllEqual(sample, sample)
-
-  def testMean(self):
-    with self.test_session():
-      p = np.array([[0.2, 0.7], [0.5, 0.4]], dtype=np.float32)
-      dist = bernoulli.Bernoulli(probs=p)
-      self.assertAllEqual(dist.mean().eval(), p)
-
-  def testVarianceAndStd(self):
-    var = lambda p: p * (1. - p)
-    with self.test_session():
-      p = [[0.2, 0.7], [0.5, 0.4]]
-      dist = bernoulli.Bernoulli(probs=p)
-      self.assertAllClose(
-          dist.variance().eval(),
-          np.array(
-              [[var(0.2), var(0.7)], [var(0.5), var(0.4)]], dtype=np.float32))
-      self.assertAllClose(
-          dist.stddev().eval(),
-          np.array(
-              [[np.sqrt(var(0.2)), np.sqrt(var(0.7))],
-               [np.sqrt(var(0.5)), np.sqrt(var(0.4))]],
-              dtype=np.float32))
-
-  def testBernoulliWithSigmoidProbs(self):
-    p = np.array([8.3, 4.2])
-    dist = bernoulli.BernoulliWithSigmoidProbs(logits=p)
-    with self.test_session():
-      self.assertAllClose(math_ops.sigmoid(p).eval(), dist.probs.eval())
-
-  def testBernoulliBernoulliKL(self):
-    with self.test_session() as sess:
-      batch_size = 6
-      a_p = np.array([0.5] * batch_size, dtype=np.float32)
-      b_p = np.array([0.4] * batch_size, dtype=np.float32)
-
-      a = bernoulli.Bernoulli(probs=a_p)
-      b = bernoulli.Bernoulli(probs=b_p)
-
-      kl = kullback_leibler.kl_divergence(a, b)
-      kl_val = sess.run(kl)
-
-      kl_expected = (a_p * np.log(a_p / b_p) + (1. - a_p) * np.log(
-          (1. - a_p) / (1. - b_p)))
-
-      self.assertEqual(kl.get_shape(), (batch_size,))
-      self.assertAllClose(kl_val, kl_expected)
-
-
-if __name__ == "__main__":
-  test.main()
diff --git a/tensorflow/contrib/distributions/python/kernel_tests/beta_test.py b/tensorflow/contrib/distributions/python/kernel_tests/beta_test.py
deleted file mode 100644
index ec16a85991..0000000000
--- a/tensorflow/contrib/distributions/python/kernel_tests/beta_test.py
+++ /dev/null
@@ -1,363 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-from scipy import special
-from scipy import stats
-from tensorflow.contrib.distributions.python.ops import beta as beta_lib
-from tensorflow.python.client import session
-from tensorflow.python.framework import constant_op
-from tensorflow.python.framework import random_seed
-from tensorflow.python.framework import tensor_shape
-from tensorflow.python.ops import math_ops
-from tensorflow.python.ops import nn_ops
-from tensorflow.python.ops.distributions import kullback_leibler
-from tensorflow.python.platform import test
-
-
-class BetaTest(test.TestCase):
-
-  def testSimpleShapes(self):
-    with self.test_session():
-      a = np.random.rand(3)
-      b = np.random.rand(3)
-      dist = beta_lib.Beta(a, b)
-      self.assertAllEqual([], dist.event_shape_tensor().eval())
-      self.assertAllEqual([3], dist.batch_shape_tensor().eval())
-      self.assertEqual(tensor_shape.TensorShape([]), dist.event_shape)
-      self.assertEqual(tensor_shape.TensorShape([3]), dist.batch_shape)
-
-  def testComplexShapes(self):
-    with self.test_session():
-      a = np.random.rand(3, 2, 2)
-      b = np.random.rand(3, 2, 2)
-      dist = beta_lib.Beta(a, b)
-      self.assertAllEqual([], dist.event_shape_tensor().eval())
-      self.assertAllEqual([3, 2, 2], dist.batch_shape_tensor().eval())
-      self.assertEqual(tensor_shape.TensorShape([]), dist.event_shape)
-      self.assertEqual(
-          tensor_shape.TensorShape([3, 2, 2]), dist.batch_shape)
-
-  def testComplexShapesBroadcast(self):
-    with self.test_session():
-      a = np.random.rand(3, 2, 2)
-      b = np.random.rand(2, 2)
-      dist = beta_lib.Beta(a, b)
-      self.assertAllEqual([], dist.event_shape_tensor().eval())
-      self.assertAllEqual([3, 2, 2], dist.batch_shape_tensor().eval())
-      self.assertEqual(tensor_shape.TensorShape([]), dist.event_shape)
-      self.assertEqual(
-          tensor_shape.TensorShape([3, 2, 2]), dist.batch_shape)
-
-  def testAlphaProperty(self):
-    a = [[1., 2, 3]]
-    b = [[2., 4, 3]]
-    with self.test_session():
-      dist = beta_lib.Beta(a, b)
-      self.assertEqual([1, 3], dist.concentration1.get_shape())
-      self.assertAllClose(a, dist.concentration1.eval())
-
-  def testBetaProperty(self):
-    a = [[1., 2, 3]]
-    b = [[2., 4, 3]]
-    with self.test_session():
-      dist = beta_lib.Beta(a, b)
-      self.assertEqual([1, 3], dist.concentration0.get_shape())
-      self.assertAllClose(b, dist.concentration0.eval())
-
-  def testPdfXProper(self):
-    a = [[1., 2, 3]]
-    b = [[2., 4, 3]]
-    with self.test_session():
-      dist = beta_lib.Beta(a, b, validate_args=True)
-      dist.prob([.1, .3, .6]).eval()
-      dist.prob([.2, .3, .5]).eval()
-      # Either condition can trigger.
-      with self.assertRaisesOpError("sample must be positive"):
-        dist.prob([-1., 0.1, 0.5]).eval()
-      with self.assertRaisesOpError("sample must be positive"):
-        dist.prob([0., 0.1, 0.5]).eval()
-      with self.assertRaisesOpError("sample must be no larger than `1`"):
-        dist.prob([.1, .2, 1.2]).eval()
-
-  def testPdfTwoBatches(self):
-    with self.test_session():
-      a = [1., 2]
-      b = [1., 2]
-      x = [.5, .5]
-      dist = beta_lib.Beta(a, b)
-      pdf = dist.prob(x)
-      self.assertAllClose([1., 3. / 2], pdf.eval())
-      self.assertEqual((2,), pdf.get_shape())
-
-  def testPdfTwoBatchesNontrivialX(self):
-    with self.test_session():
-      a = [1., 2]
-      b = [1., 2]
-      x = [.3, .7]
-      dist = beta_lib.Beta(a, b)
-      pdf = dist.prob(x)
-      self.assertAllClose([1, 63. / 50], pdf.eval())
-      self.assertEqual((2,), pdf.get_shape())
-
-  def testPdfUniformZeroBatch(self):
-    with self.test_session():
-      # This is equivalent to a uniform distribution
-      a = 1.
-      b = 1.
-      x = np.array([.1, .2, .3, .5, .8], dtype=np.float32)
-      dist = beta_lib.Beta(a, b)
-      pdf = dist.prob(x)
-      self.assertAllClose([1.] * 5, pdf.eval())
-      self.assertEqual((5,), pdf.get_shape())
-
-  def testPdfAlphaStretchedInBroadcastWhenSameRank(self):
-    with self.test_session():
-      a = [[1., 2]]
-      b = [[1., 2]]
-      x = [[.5, .5], [.3, .7]]
-      dist = beta_lib.Beta(a, b)
-      pdf = dist.prob(x)
-      self.assertAllClose([[1., 3. / 2], [1., 63. / 50]], pdf.eval())
-      self.assertEqual((2, 2), pdf.get_shape())
-
-  def testPdfAlphaStretchedInBroadcastWhenLowerRank(self):
-    with self.test_session():
-      a = [1., 2]
-      b = [1., 2]
-      x = [[.5, .5], [.2, .8]]
-      pdf = beta_lib.Beta(a, b).prob(x)
-      self.assertAllClose([[1., 3. / 2], [1., 24. / 25]], pdf.eval())
-      self.assertEqual((2, 2), pdf.get_shape())
-
-  def testPdfXStretchedInBroadcastWhenSameRank(self):
-    with self.test_session():
-      a = [[1., 2], [2., 3]]
-      b = [[1., 2], [2., 3]]
-      x = [[.5, .5]]
-      pdf = beta_lib.Beta(a, b).prob(x)
-      self.assertAllClose([[1., 3. / 2], [3. / 2, 15. / 8]], pdf.eval())
-      self.assertEqual((2, 2), pdf.get_shape())
-
-  def testPdfXStretchedInBroadcastWhenLowerRank(self):
-    with self.test_session():
-      a = [[1., 2], [2., 3]]
-      b = [[1., 2], [2., 3]]
-      x = [.5, .5]
-      pdf = beta_lib.Beta(a, b).prob(x)
-      self.assertAllClose([[1., 3. / 2], [3. / 2, 15. / 8]], pdf.eval())
-      self.assertEqual((2, 2), pdf.get_shape())
-
-  def testBetaMean(self):
-    with session.Session():
-      a = [1., 2, 3]
-      b = [2., 4, 1.2]
-      expected_mean = stats.beta.mean(a, b)
-      dist = beta_lib.Beta(a, b)
-      self.assertEqual(dist.mean().get_shape(), (3,))
-      self.assertAllClose(expected_mean, dist.mean().eval())
-
-  def testBetaVariance(self):
-    with session.Session():
-      a = [1., 2, 3]
-      b = [2., 4, 1.2]
-      expected_variance = stats.beta.var(a, b)
-      dist = beta_lib.Beta(a, b)
-      self.assertEqual(dist.variance().get_shape(), (3,))
-      self.assertAllClose(expected_variance, dist.variance().eval())
-
-  def testBetaMode(self):
-    with session.Session():
-      a = np.array([1.1, 2, 3])
-      b = np.array([2., 4, 1.2])
-      expected_mode = (a - 1) / (a + b - 2)
-      dist = beta_lib.Beta(a, b)
-      self.assertEqual(dist.mode().get_shape(), (3,))
-      self.assertAllClose(expected_mode, dist.mode().eval())
-
-  def testBetaModeInvalid(self):
-    with session.Session():
-      a = np.array([1., 2, 3])
-      b = np.array([2., 4, 1.2])
-      dist = beta_lib.Beta(a, b, allow_nan_stats=False)
-      with self.assertRaisesOpError("Condition x < y.*"):
-        dist.mode().eval()
-
-      a = np.array([2., 2, 3])
-      b = np.array([1., 4, 1.2])
-      dist = beta_lib.Beta(a, b, allow_nan_stats=False)
-      with self.assertRaisesOpError("Condition x < y.*"):
-        dist.mode().eval()
-
-  def testBetaModeEnableAllowNanStats(self):
-    with session.Session():
-      a = np.array([1., 2, 3])
-      b = np.array([2., 4, 1.2])
-      dist = beta_lib.Beta(a, b, allow_nan_stats=True)
-
-      expected_mode = (a - 1) / (a + b - 2)
-      expected_mode[0] = np.nan
-      self.assertEqual((3,), dist.mode().get_shape())
-      self.assertAllClose(expected_mode, dist.mode().eval())
-
-      a = np.array([2., 2, 3])
-      b = np.array([1., 4, 1.2])
-      dist = beta_lib.Beta(a, b, allow_nan_stats=True)
-
-      expected_mode = (a - 1) / (a + b - 2)
-      expected_mode[0] = np.nan
-      self.assertEqual((3,), dist.mode().get_shape())
-      self.assertAllClose(expected_mode, dist.mode().eval())
-
-  def testBetaEntropy(self):
-    with session.Session():
-      a = [1., 2, 3]
-      b = [2., 4, 1.2]
-      expected_entropy = stats.beta.entropy(a, b)
-      dist = beta_lib.Beta(a, b)
-      self.assertEqual(dist.entropy().get_shape(), (3,))
-      self.assertAllClose(expected_entropy, dist.entropy().eval())
-
-  def testBetaSample(self):
-    with self.test_session():
-      a = 1.
-      b = 2.
-      beta = beta_lib.Beta(a, b)
-      n = constant_op.constant(100000)
-      samples = beta.sample(n)
-      sample_values = samples.eval()
-      self.assertEqual(sample_values.shape, (100000,))
-      self.assertFalse(np.any(sample_values < 0.0))
-      self.assertLess(
-          stats.kstest(
-              # Beta is a univariate distribution.
-              sample_values,
-              stats.beta(a=1., b=2.).cdf)[0],
-          0.01)
-      # The standard error of the sample mean is 1 / (sqrt(18 * n))
-      self.assertAllClose(
-          sample_values.mean(axis=0), stats.beta.mean(a, b), atol=1e-2)
-      self.assertAllClose(
-          np.cov(sample_values, rowvar=0), stats.beta.var(a, b), atol=1e-1)
-
-  # Test that sampling with the same seed twice gives the same results.
-  def testBetaSampleMultipleTimes(self):
-    with self.test_session():
-      a_val = 1.
-      b_val = 2.
-      n_val = 100
-
-      random_seed.set_random_seed(654321)
-      beta1 = beta_lib.Beta(concentration1=a_val,
-                            concentration0=b_val,
-                            name="beta1")
-      samples1 = beta1.sample(n_val, seed=123456).eval()
-
-      random_seed.set_random_seed(654321)
-      beta2 = beta_lib.Beta(concentration1=a_val,
-                            concentration0=b_val,
-                            name="beta2")
-      samples2 = beta2.sample(n_val, seed=123456).eval()
-
-      self.assertAllClose(samples1, samples2)
-
-  def testBetaSampleMultidimensional(self):
-    with self.test_session():
-      a = np.random.rand(3, 2, 2).astype(np.float32)
-      b = np.random.rand(3, 2, 2).astype(np.float32)
-      beta = beta_lib.Beta(a, b)
-      n = constant_op.constant(100000)
-      samples = beta.sample(n)
-      sample_values = samples.eval()
-      self.assertEqual(sample_values.shape, (100000, 3, 2, 2))
-      self.assertFalse(np.any(sample_values < 0.0))
-      self.assertAllClose(
-          sample_values[:, 1, :].mean(axis=0),
-          stats.beta.mean(a, b)[1, :],
-          atol=1e-1)
-
-  def testBetaCdf(self):
-    with self.test_session():
-      shape = (30, 40, 50)
-      for dt in (np.float32, np.float64):
-        a = 10. * np.random.random(shape).astype(dt)
-        b = 10. * np.random.random(shape).astype(dt)
-        x = np.random.random(shape).astype(dt)
-        actual = beta_lib.Beta(a, b).cdf(x).eval()
-        self.assertAllEqual(np.ones(shape, dtype=np.bool), 0. <= x)
-        self.assertAllEqual(np.ones(shape, dtype=np.bool), 1. >= x)
-        self.assertAllClose(stats.beta.cdf(x, a, b), actual, rtol=1e-4, atol=0)
-
-  def testBetaLogCdf(self):
-    with self.test_session():
-      shape = (30, 40, 50)
-      for dt in (np.float32, np.float64):
-        a = 10. * np.random.random(shape).astype(dt)
-        b = 10. * np.random.random(shape).astype(dt)
-        x = np.random.random(shape).astype(dt)
-        actual = math_ops.exp(beta_lib.Beta(a, b).log_cdf(x)).eval()
-        self.assertAllEqual(np.ones(shape, dtype=np.bool), 0. <= x)
-        self.assertAllEqual(np.ones(shape, dtype=np.bool), 1. >= x)
-        self.assertAllClose(stats.beta.cdf(x, a, b), actual, rtol=1e-4, atol=0)
-
-  def testBetaWithSoftplusConcentration(self):
-    with self.test_session():
-      a, b = -4.2, -9.1
-      dist = beta_lib.BetaWithSoftplusConcentration(a, b)
-      self.assertAllClose(nn_ops.softplus(a).eval(), dist.concentration1.eval())
-      self.assertAllClose(nn_ops.softplus(b).eval(), dist.concentration0.eval())
-
-  def testBetaBetaKL(self):
-    with self.test_session() as sess:
-      for shape in [(10,), (4, 5)]:
-        a1 = 6.0 * np.random.random(size=shape) + 1e-4
-        b1 = 6.0 * np.random.random(size=shape) + 1e-4
-        a2 = 6.0 * np.random.random(size=shape) + 1e-4
-        b2 = 6.0 * np.random.random(size=shape) + 1e-4
-        # Take inverse softplus of values to test BetaWithSoftplusConcentration
-        a1_sp = np.log(np.exp(a1) - 1.0)
-        b1_sp = np.log(np.exp(b1) - 1.0)
-        a2_sp = np.log(np.exp(a2) - 1.0)
-        b2_sp = np.log(np.exp(b2) - 1.0)
-
-        d1 = beta_lib.Beta(concentration1=a1, concentration0=b1)
-        d2 = beta_lib.Beta(concentration1=a2, concentration0=b2)
-        d1_sp = beta_lib.BetaWithSoftplusConcentration(concentration1=a1_sp,
-                                                       concentration0=b1_sp)
-        d2_sp = beta_lib.BetaWithSoftplusConcentration(concentration1=a2_sp,
-                                                       concentration0=b2_sp)
-
-        kl_expected = (special.betaln(a2, b2) - special.betaln(a1, b1) +
-                       (a1 - a2) * special.digamma(a1) +
-                       (b1 - b2) * special.digamma(b1) +
-                       (a2 - a1 + b2 - b1) * special.digamma(a1 + b1))
-
-        for dist1 in [d1, d1_sp]:
-          for dist2 in [d2, d2_sp]:
-            kl = kullback_leibler.kl_divergence(dist1, dist2)
-            kl_val = sess.run(kl)
-            self.assertEqual(kl.get_shape(), shape)
-            self.assertAllClose(kl_val, kl_expected)
-
-        # Make sure KL(d1||d1) is 0
-        kl_same = sess.run(kullback_leibler.kl_divergence(d1, d1))
-        self.assertAllClose(kl_same, np.zeros_like(kl_expected))
-
-
-if __name__ == "__main__":
-  test.main()
diff --git a/tensorflow/contrib/distributions/python/kernel_tests/bijectors/cholesky_outer_product_test.py b/tensorflow/contrib/distributions/python/kernel_tests/bijectors/cholesky_outer_product_test.py
index 267e4ad350..a4688829f1 100644
--- a/tensorflow/contrib/distributions/python/kernel_tests/bijectors/cholesky_outer_product_test.py
+++ b/tensorflow/contrib/distributions/python/kernel_tests/bijectors/cholesky_outer_product_test.py
@@ -19,11 +19,11 @@ from __future__ import division
 from __future__ import print_function
 
 from tensorflow.contrib.distributions.python.ops import bijectors
-from tensorflow.contrib.distributions.python.ops import gamma as gamma_lib
 from tensorflow.contrib.distributions.python.ops import transformed_distribution as transformed_distribution_lib
 from tensorflow.contrib.distributions.python.ops.bijectors.bijector_test_util import assert_scalar_congruency
 from tensorflow.python.framework import tensor_shape
 from tensorflow.python.ops import array_ops
+from tensorflow.python.ops.distributions import gamma as gamma_lib
 from tensorflow.python.platform import test
 
 
diff --git a/tensorflow/contrib/distributions/python/kernel_tests/bijectors/invert_test.py b/tensorflow/contrib/distributions/python/kernel_tests/bijectors/invert_test.py
index 267e4ad350..a4688829f1 100644
--- a/tensorflow/contrib/distributions/python/kernel_tests/bijectors/invert_test.py
+++ b/tensorflow/contrib/distributions/python/kernel_tests/bijectors/invert_test.py
@@ -19,11 +19,11 @@ from __future__ import division
 from __future__ import print_function
 
 from tensorflow.contrib.distributions.python.ops import bijectors
-from tensorflow.contrib.distributions.python.ops import gamma as gamma_lib
 from tensorflow.contrib.distributions.python.ops import transformed_distribution as transformed_distribution_lib
 from tensorflow.contrib.distributions.python.ops.bijectors.bijector_test_util import assert_scalar_congruency
 from tensorflow.python.framework import tensor_shape
 from tensorflow.python.ops import array_ops
+from tensorflow.python.ops.distributions import gamma as gamma_lib
 from tensorflow.python.platform import test
 
 
diff --git a/tensorflow/contrib/distributions/python/kernel_tests/categorical_test.py b/tensorflow/contrib/distributions/python/kernel_tests/categorical_test.py
deleted file mode 100644
index 269c02ede3..0000000000
--- a/tensorflow/contrib/distributions/python/kernel_tests/categorical_test.py
+++ /dev/null
@@ -1,297 +0,0 @@
-# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for Categorical distribution."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-
-from tensorflow.contrib.distributions.python.ops import categorical
-from tensorflow.python.framework import constant_op
-from tensorflow.python.framework import dtypes
-from tensorflow.python.framework import tensor_util
-from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import gradients_impl
-from tensorflow.python.ops import math_ops
-from tensorflow.python.ops import nn_ops
-from tensorflow.python.ops import random_ops
-from tensorflow.python.ops.distributions import kullback_leibler
-from tensorflow.python.platform import test
-
-
-def make_categorical(batch_shape, num_classes, dtype=dtypes.int32):
-  logits = random_ops.random_uniform(
-      list(batch_shape) + [num_classes], -10, 10, dtype=dtypes.float32) - 50.
-  return categorical.Categorical(logits, dtype=dtype)
-
-
-class CategoricalTest(test.TestCase):
-
-  def testP(self):
-    p = [0.2, 0.8]
-    dist = categorical.Categorical(probs=p)
-    with self.test_session():
-      self.assertAllClose(p, dist.probs.eval())
-      self.assertAllEqual([2], dist.logits.get_shape())
-
-  def testLogits(self):
-    p = np.array([0.2, 0.8], dtype=np.float32)
-    logits = np.log(p) - 50.
-    dist = categorical.Categorical(logits=logits)
-    with self.test_session():
-      self.assertAllEqual([2], dist.probs.get_shape())
-      self.assertAllEqual([2], dist.logits.get_shape())
-      self.assertAllClose(dist.probs.eval(), p)
-      self.assertAllClose(dist.logits.eval(), logits)
-
-  def testShapes(self):
-    with self.test_session():
-      for batch_shape in ([], [1], [2, 3, 4]):
-        dist = make_categorical(batch_shape, 10)
-        self.assertAllEqual(batch_shape, dist.batch_shape)
-        self.assertAllEqual(batch_shape, dist.batch_shape_tensor().eval())
-        self.assertAllEqual([], dist.event_shape)
-        self.assertAllEqual([], dist.event_shape_tensor().eval())
-        self.assertEqual(10, dist.event_size.eval())
-        # event_size is available as a constant because the shape is
-        # known at graph build time.
-        self.assertEqual(10, tensor_util.constant_value(dist.event_size))
-
-      for batch_shape in ([], [1], [2, 3, 4]):
-        dist = make_categorical(
-            batch_shape, constant_op.constant(
-                10, dtype=dtypes.int32))
-        self.assertAllEqual(len(batch_shape), dist.batch_shape.ndims)
-        self.assertAllEqual(batch_shape, dist.batch_shape_tensor().eval())
-        self.assertAllEqual([], dist.event_shape)
-        self.assertAllEqual([], dist.event_shape_tensor().eval())
-        self.assertEqual(10, dist.event_size.eval())
-
-  def testDtype(self):
-    dist = make_categorical([], 5, dtype=dtypes.int32)
-    self.assertEqual(dist.dtype, dtypes.int32)
-    self.assertEqual(dist.dtype, dist.sample(5).dtype)
-    self.assertEqual(dist.dtype, dist.mode().dtype)
-    dist = make_categorical([], 5, dtype=dtypes.int64)
-    self.assertEqual(dist.dtype, dtypes.int64)
-    self.assertEqual(dist.dtype, dist.sample(5).dtype)
-    self.assertEqual(dist.dtype, dist.mode().dtype)
-    self.assertEqual(dist.probs.dtype, dtypes.float32)
-    self.assertEqual(dist.logits.dtype, dtypes.float32)
-    self.assertEqual(dist.logits.dtype, dist.entropy().dtype)
-    self.assertEqual(
-        dist.logits.dtype, dist.prob(np.array(
-            0, dtype=np.int64)).dtype)
-    self.assertEqual(
-        dist.logits.dtype, dist.log_prob(np.array(
-            0, dtype=np.int64)).dtype)
-
-  def testUnknownShape(self):
-    with self.test_session():
-      logits = array_ops.placeholder(dtype=dtypes.float32)
-      dist = categorical.Categorical(logits)
-      sample = dist.sample()
-      # Will sample class 1.
-      sample_value = sample.eval(feed_dict={logits: [-1000.0, 1000.0]})
-      self.assertEqual(1, sample_value)
-
-      # Batch entry 0 will sample class 1, batch entry 1 will sample class 0.
-      sample_value_batch = sample.eval(
-          feed_dict={logits: [[-1000.0, 1000.0], [1000.0, -1000.0]]})
-      self.assertAllEqual([1, 0], sample_value_batch)
-
-  def testPMFWithBatch(self):
-    histograms = [[0.2, 0.8], [0.6, 0.4]]
-    dist = categorical.Categorical(math_ops.log(histograms) - 50.)
-    with self.test_session():
-      self.assertAllClose(dist.prob([0, 1]).eval(), [0.2, 0.4])
-
-  def testPMFNoBatch(self):
-    histograms = [0.2, 0.8]
-    dist = categorical.Categorical(math_ops.log(histograms) - 50.)
-    with self.test_session():
-      self.assertAllClose(dist.prob(0).eval(), 0.2)
-
-  def testLogPMF(self):
-    logits = np.log([[0.2, 0.8], [0.6, 0.4]]) - 50.
-    dist = categorical.Categorical(logits)
-    with self.test_session():
-      self.assertAllClose(dist.log_prob([0, 1]).eval(), np.log([0.2, 0.4]))
-
-  def testEntropyNoBatch(self):
-    logits = np.log([0.2, 0.8]) - 50.
-    dist = categorical.Categorical(logits)
-    with self.test_session():
-      self.assertAllClose(dist.entropy().eval(),
-                          -(0.2 * np.log(0.2) + 0.8 * np.log(0.8)))
-
-  def testEntropyWithBatch(self):
-    logits = np.log([[0.2, 0.8], [0.6, 0.4]]) - 50.
-    dist = categorical.Categorical(logits)
-    with self.test_session():
-      self.assertAllClose(dist.entropy().eval(), [
-          -(0.2 * np.log(0.2) + 0.8 * np.log(0.8)),
-          -(0.6 * np.log(0.6) + 0.4 * np.log(0.4))
-      ])
-
-  def testEntropyGradient(self):
-    with self.test_session() as sess:
-      logits = constant_op.constant([[1., 2., 3.], [2., 5., 1.]])
-
-      probabilities = nn_ops.softmax(logits)
-      log_probabilities = nn_ops.log_softmax(logits)
-      true_entropy = - math_ops.reduce_sum(
-          probabilities * log_probabilities, axis=-1)
-
-      categorical_distribution = categorical.Categorical(probs=probabilities)
-      categorical_entropy = categorical_distribution.entropy()
-
-      # works
-      true_entropy_g = gradients_impl.gradients(true_entropy, [logits])
-      categorical_entropy_g = gradients_impl.gradients(
-          categorical_entropy, [logits])
-
-      res = sess.run({"true_entropy": true_entropy,
-                      "categorical_entropy": categorical_entropy,
-                      "true_entropy_g": true_entropy_g,
-                      "categorical_entropy_g": categorical_entropy_g})
-      self.assertAllClose(res["true_entropy"],
-                          res["categorical_entropy"])
-      self.assertAllClose(res["true_entropy_g"],
-                          res["categorical_entropy_g"])
-
-  def testSample(self):
-    with self.test_session():
-      histograms = [[[0.2, 0.8], [0.4, 0.6]]]
-      dist = categorical.Categorical(math_ops.log(histograms) - 50.)
-      n = 10000
-      samples = dist.sample(n, seed=123)
-      samples.set_shape([n, 1, 2])
-      self.assertEqual(samples.dtype, dtypes.int32)
-      sample_values = samples.eval()
-      self.assertFalse(np.any(sample_values < 0))
-      self.assertFalse(np.any(sample_values > 1))
-      self.assertAllClose(
-          [[0.2, 0.4]], np.mean(
-              sample_values == 0, axis=0), atol=1e-2)
-      self.assertAllClose(
-          [[0.8, 0.6]], np.mean(
-              sample_values == 1, axis=0), atol=1e-2)
-
-  def testSampleWithSampleShape(self):
-    with self.test_session():
-      histograms = [[[0.2, 0.8], [0.4, 0.6]]]
-      dist = categorical.Categorical(math_ops.log(histograms) - 50.)
-      samples = dist.sample((100, 100), seed=123)
-      prob = dist.prob(samples)
-      prob_val = prob.eval()
-      self.assertAllClose(
-          [0.2**2 + 0.8**2], [prob_val[:, :, :, 0].mean()], atol=1e-2)
-      self.assertAllClose(
-          [0.4**2 + 0.6**2], [prob_val[:, :, :, 1].mean()], atol=1e-2)
-
-  def testLogPMFBroadcasting(self):
-    with self.test_session():
-      histograms = [[[0.2, 0.8], [0.4, 0.6]]]
-      dist = categorical.Categorical(math_ops.log(histograms) - 50.)
-
-      prob = dist.prob(1)
-      self.assertAllClose([[0.8, 0.6]], prob.eval())
-
-      prob = dist.prob([1])
-      self.assertAllClose([[0.8, 0.6]], prob.eval())
-
-      prob = dist.prob([0, 1])
-      self.assertAllClose([[0.2, 0.6]], prob.eval())
-
-      prob = dist.prob([[0, 1]])
-      self.assertAllClose([[0.2, 0.6]], prob.eval())
-
-      prob = dist.prob([[[0, 1]]])
-      self.assertAllClose([[[0.2, 0.6]]], prob.eval())
-
-      prob = dist.prob([[1, 0], [0, 1]])
-      self.assertAllClose([[0.8, 0.4], [0.2, 0.6]], prob.eval())
-
-      prob = dist.prob([[[1, 1], [1, 0]], [[1, 0], [0, 1]]])
-      self.assertAllClose([[[0.8, 0.6], [0.8, 0.4]], [[0.8, 0.4], [0.2, 0.6]]],
-                          prob.eval())
-
-  def testLogPMFShape(self):
-    with self.test_session():
-      # shape [1, 2, 2]
-      histograms = [[[0.2, 0.8], [0.4, 0.6]]]
-      dist = categorical.Categorical(math_ops.log(histograms))
-
-      log_prob = dist.log_prob([0, 1])
-      self.assertEqual(2, log_prob.get_shape().ndims)
-      self.assertAllEqual([1, 2], log_prob.get_shape())
-
-      log_prob = dist.log_prob([[[1, 1], [1, 0]], [[1, 0], [0, 1]]])
-      self.assertEqual(3, log_prob.get_shape().ndims)
-      self.assertAllEqual([2, 2, 2], log_prob.get_shape())
-
-  def testLogPMFShapeNoBatch(self):
-    histograms = [0.2, 0.8]
-    dist = categorical.Categorical(math_ops.log(histograms))
-
-    log_prob = dist.log_prob(0)
-    self.assertEqual(0, log_prob.get_shape().ndims)
-    self.assertAllEqual([], log_prob.get_shape())
-
-    log_prob = dist.log_prob([[[1, 1], [1, 0]], [[1, 0], [0, 1]]])
-    self.assertEqual(3, log_prob.get_shape().ndims)
-    self.assertAllEqual([2, 2, 2], log_prob.get_shape())
-
-  def testMode(self):
-    with self.test_session():
-      histograms = [[[0.2, 0.8], [0.6, 0.4]]]
-      dist = categorical.Categorical(math_ops.log(histograms) - 50.)
-      self.assertAllEqual(dist.mode().eval(), [[1, 0]])
-
-  def testCategoricalCategoricalKL(self):
-
-    def np_softmax(logits):
-      exp_logits = np.exp(logits)
-      return exp_logits / exp_logits.sum(axis=-1, keepdims=True)
-
-    with self.test_session() as sess:
-      for categories in [2, 4]:
-        for batch_size in [1, 10]:
-          a_logits = np.random.randn(batch_size, categories)
-          b_logits = np.random.randn(batch_size, categories)
-
-          a = categorical.Categorical(logits=a_logits)
-          b = categorical.Categorical(logits=b_logits)
-
-          kl = kullback_leibler.kl_divergence(a, b)
-          kl_val = sess.run(kl)
-          # Make sure KL(a||a) is 0
-          kl_same = sess.run(kullback_leibler.kl_divergence(a, a))
-
-          prob_a = np_softmax(a_logits)
-          prob_b = np_softmax(b_logits)
-          kl_expected = np.sum(prob_a * (np.log(prob_a) - np.log(prob_b)),
-                               axis=-1)
-
-          self.assertEqual(kl.get_shape(), (batch_size,))
-          self.assertAllClose(kl_val, kl_expected)
-          self.assertAllClose(kl_same, np.zeros_like(kl_expected))
-
-
-if __name__ == "__main__":
-  test.main()
diff --git a/tensorflow/contrib/distributions/python/kernel_tests/dirichlet_multinomial_test.py b/tensorflow/contrib/distributions/python/kernel_tests/dirichlet_multinomial_test.py
deleted file mode 100644
index bc25366cfa..0000000000
--- a/tensorflow/contrib/distributions/python/kernel_tests/dirichlet_multinomial_test.py
+++ /dev/null
@@ -1,479 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-from tensorflow.contrib import distributions
-from tensorflow.python.framework import dtypes
-from tensorflow.python.framework import tensor_shape
-from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import math_ops
-from tensorflow.python.platform import test
-
-ds = distributions
-
-
-class DirichletMultinomialTest(test.TestCase):
-
-  def setUp(self):
-    self._rng = np.random.RandomState(42)
-
-  def testSimpleShapes(self):
-    with self.test_session():
-      alpha = np.random.rand(3)
-      dist = ds.DirichletMultinomial(1., alpha)
-      self.assertEqual(3, dist.event_shape_tensor().eval())
-      self.assertAllEqual([], dist.batch_shape_tensor().eval())
-      self.assertEqual(tensor_shape.TensorShape([3]), dist.event_shape)
-      self.assertEqual(tensor_shape.TensorShape([]), dist.batch_shape)
-
-  def testComplexShapes(self):
-    with self.test_session():
-      alpha = np.random.rand(3, 2, 2)
-      n = [[3., 2], [4, 5], [6, 7]]
-      dist = ds.DirichletMultinomial(n, alpha)
-      self.assertEqual(2, dist.event_shape_tensor().eval())
-      self.assertAllEqual([3, 2], dist.batch_shape_tensor().eval())
-      self.assertEqual(tensor_shape.TensorShape([2]), dist.event_shape)
-      self.assertEqual(tensor_shape.TensorShape([3, 2]), dist.batch_shape)
-
-  def testNproperty(self):
-    alpha = [[1., 2, 3]]
-    n = [[5.]]
-    with self.test_session():
-      dist = ds.DirichletMultinomial(n, alpha)
-      self.assertEqual([1, 1], dist.total_count.get_shape())
-      self.assertAllClose(n, dist.total_count.eval())
-
-  def testAlphaProperty(self):
-    alpha = [[1., 2, 3]]
-    with self.test_session():
-      dist = ds.DirichletMultinomial(1, alpha)
-      self.assertEqual([1, 3], dist.concentration.get_shape())
-      self.assertAllClose(alpha, dist.concentration.eval())
-
-  def testPmfNandCountsAgree(self):
-    alpha = [[1., 2, 3]]
-    n = [[5.]]
-    with self.test_session():
-      dist = ds.DirichletMultinomial(n, alpha, validate_args=True)
-      dist.prob([2., 3, 0]).eval()
-      dist.prob([3., 0, 2]).eval()
-      with self.assertRaisesOpError("counts must be non-negative"):
-        dist.prob([-1., 4, 2]).eval()
-      with self.assertRaisesOpError(
-          "counts last-dimension must sum to `self.total_count`"):
-        dist.prob([3., 3, 0]).eval()
-
-  def testPmfNonIntegerCounts(self):
-    alpha = [[1., 2, 3]]
-    n = [[5.]]
-    with self.test_session():
-      dist = ds.DirichletMultinomial(n, alpha, validate_args=True)
-      dist.prob([2., 3, 0]).eval()
-      dist.prob([3., 0, 2]).eval()
-      dist.prob([3.0, 0, 2.0]).eval()
-      # Both equality and integer checking fail.
-      placeholder = array_ops.placeholder(dtypes.float32)
-      with self.assertRaisesOpError(
-          "counts cannot contain fractional components"):
-        dist.prob(placeholder).eval(feed_dict={placeholder: [1.0, 2.5, 1.5]})
-      dist = ds.DirichletMultinomial(n, alpha, validate_args=False)
-      dist.prob([1., 2., 3.]).eval()
-      # Non-integer arguments work.
-      dist.prob([1.0, 2.5, 1.5]).eval()
-
-  def testPmfBothZeroBatches(self):
-    # The probabilities of one vote falling into class k is the mean for class
-    # k.
-    with self.test_session():
-      # Both zero-batches.  No broadcast
-      alpha = [1., 2]
-      counts = [1., 0]
-      dist = ds.DirichletMultinomial(1., alpha)
-      pmf = dist.prob(counts)
-      self.assertAllClose(1 / 3., pmf.eval())
-      self.assertEqual((), pmf.get_shape())
-
-  def testPmfBothZeroBatchesNontrivialN(self):
-    # The probabilities of one vote falling into class k is the mean for class
-    # k.
-    with self.test_session():
-      # Both zero-batches.  No broadcast
-      alpha = [1., 2]
-      counts = [3., 2]
-      dist = ds.DirichletMultinomial(5., alpha)
-      pmf = dist.prob(counts)
-      self.assertAllClose(1 / 7., pmf.eval())
-      self.assertEqual((), pmf.get_shape())
-
-  def testPmfBothZeroBatchesMultidimensionalN(self):
-    # The probabilities of one vote falling into class k is the mean for class
-    # k.
-    with self.test_session():
-      alpha = [1., 2]
-      counts = [3., 2]
-      n = np.full([4, 3], 5., dtype=np.float32)
-      dist = ds.DirichletMultinomial(n, alpha)
-      pmf = dist.prob(counts)
-      self.assertAllClose([[1 / 7., 1 / 7., 1 / 7.]] * 4, pmf.eval())
-      self.assertEqual((4, 3), pmf.get_shape())
-
-  def testPmfAlphaStretchedInBroadcastWhenSameRank(self):
-    # The probabilities of one vote falling into class k is the mean for class
-    # k.
-    with self.test_session():
-      alpha = [[1., 2]]
-      counts = [[1., 0], [0., 1]]
-      dist = ds.DirichletMultinomial([1.], alpha)
-      pmf = dist.prob(counts)
-      self.assertAllClose([1 / 3., 2 / 3.], pmf.eval())
-      self.assertAllEqual([2], pmf.get_shape())
-
-  def testPmfAlphaStretchedInBroadcastWhenLowerRank(self):
-    # The probabilities of one vote falling into class k is the mean for class
-    # k.
-    with self.test_session():
-      alpha = [1., 2]
-      counts = [[1., 0], [0., 1]]
-      pmf = ds.DirichletMultinomial(1., alpha).prob(counts)
-      self.assertAllClose([1 / 3., 2 / 3.], pmf.eval())
-      self.assertAllEqual([2], pmf.get_shape())
-
-  def testPmfCountsStretchedInBroadcastWhenSameRank(self):
-    # The probabilities of one vote falling into class k is the mean for class
-    # k.
-    with self.test_session():
-      alpha = [[1., 2], [2., 3]]
-      counts = [[1., 0]]
-      pmf = ds.DirichletMultinomial([1., 1.], alpha).prob(counts)
-      self.assertAllClose([1 / 3., 2 / 5.], pmf.eval())
-      self.assertAllEqual([2], pmf.get_shape())
-
-  def testPmfCountsStretchedInBroadcastWhenLowerRank(self):
-    # The probabilities of one vote falling into class k is the mean for class
-    # k.
-    with self.test_session():
-      alpha = [[1., 2], [2., 3]]
-      counts = [1., 0]
-      pmf = ds.DirichletMultinomial(1., alpha).prob(counts)
-      self.assertAllClose([1 / 3., 2 / 5.], pmf.eval())
-      self.assertAllEqual([2], pmf.get_shape())
-
-  def testPmfForOneVoteIsTheMeanWithOneRecordInput(self):
-    # The probabilities of one vote falling into class k is the mean for class
-    # k.
-    alpha = [1., 2, 3]
-    with self.test_session():
-      for class_num in range(3):
-        counts = np.zeros([3], dtype=np.float32)
-        counts[class_num] = 1
-        dist = ds.DirichletMultinomial(1., alpha)
-        mean = dist.mean().eval()
-        pmf = dist.prob(counts).eval()
-
-        self.assertAllClose(mean[class_num], pmf)
-        self.assertAllEqual([3], mean.shape)
-        self.assertAllEqual([], pmf.shape)
-
-  def testMeanDoubleTwoVotes(self):
-    # The probabilities of two votes falling into class k for
-    # DirichletMultinomial(2, alpha) is twice as much as the probability of one
-    # vote falling into class k for DirichletMultinomial(1, alpha)
-    alpha = [1., 2, 3]
-    with self.test_session():
-      for class_num in range(3):
-        counts_one = np.zeros([3], dtype=np.float32)
-        counts_one[class_num] = 1.
-        counts_two = np.zeros([3], dtype=np.float32)
-        counts_two[class_num] = 2
-
-        dist1 = ds.DirichletMultinomial(1., alpha)
-        dist2 = ds.DirichletMultinomial(2., alpha)
-
-        mean1 = dist1.mean().eval()
-        mean2 = dist2.mean().eval()
-
-        self.assertAllClose(mean2[class_num], 2 * mean1[class_num])
-        self.assertAllEqual([3], mean1.shape)
-
-  def testCovarianceFromSampling(self):
-    # We will test mean, cov, var, stddev on a DirichletMultinomial constructed
-    # via broadcast between alpha, n.
-    alpha = np.array([[1., 2, 3],
-                      [2.5, 4, 0.01]], dtype=np.float32)
-    # Ideally we'd be able to test broadcasting but, the multinomial sampler
-    # doesn't support different total counts.
-    n = np.float32(5)
-    with self.test_session() as sess:
-      # batch_shape=[2], event_shape=[3]
-      dist = ds.DirichletMultinomial(n, alpha)
-      x = dist.sample(int(250e3), seed=1)
-      sample_mean = math_ops.reduce_mean(x, 0)
-      x_centered = x - sample_mean[array_ops.newaxis, ...]
-      sample_cov = math_ops.reduce_mean(math_ops.matmul(
-          x_centered[..., array_ops.newaxis],
-          x_centered[..., array_ops.newaxis, :]), 0)
-      sample_var = array_ops.matrix_diag_part(sample_cov)
-      sample_stddev = math_ops.sqrt(sample_var)
-      [
-          sample_mean_,
-          sample_cov_,
-          sample_var_,
-          sample_stddev_,
-          analytic_mean,
-          analytic_cov,
-          analytic_var,
-          analytic_stddev,
-      ] = sess.run([
-          sample_mean,
-          sample_cov,
-          sample_var,
-          sample_stddev,
-          dist.mean(),
-          dist.covariance(),
-          dist.variance(),
-          dist.stddev(),
-      ])
-      self.assertAllClose(sample_mean_, analytic_mean, atol=0., rtol=0.04)
-      self.assertAllClose(sample_cov_, analytic_cov, atol=0., rtol=0.05)
-      self.assertAllClose(sample_var_, analytic_var, atol=0., rtol=0.03)
-      self.assertAllClose(sample_stddev_, analytic_stddev, atol=0., rtol=0.02)
-
-  def testCovariance(self):
-    # Shape [2]
-    alpha = [1., 2]
-    ns = [2., 3., 4., 5.]
-    alpha_0 = np.sum(alpha)
-
-    # Diagonal entries are of the form:
-    # Var(X_i) = n * alpha_i / alpha_sum * (1 - alpha_i / alpha_sum) *
-    # (alpha_sum + n) / (alpha_sum + 1)
-    variance_entry = lambda a, a_sum: a / a_sum * (1 - a / a_sum)
-    # Off diagonal entries are of the form:
-    # Cov(X_i, X_j) = -n * alpha_i * alpha_j / (alpha_sum ** 2) *
-    # (alpha_sum + n) / (alpha_sum + 1)
-    covariance_entry = lambda a, b, a_sum: -a * b / a_sum**2
-    # Shape [2, 2].
-    shared_matrix = np.array([[
-        variance_entry(alpha[0], alpha_0),
-        covariance_entry(alpha[0], alpha[1], alpha_0)
-    ], [
-        covariance_entry(alpha[1], alpha[0], alpha_0),
-        variance_entry(alpha[1], alpha_0)
-    ]])
-
-    with self.test_session():
-      for n in ns:
-        # n is shape [] and alpha is shape [2].
-        dist = ds.DirichletMultinomial(n, alpha)
-        covariance = dist.covariance()
-        expected_covariance = n * (n + alpha_0) / (1 + alpha_0) * shared_matrix
-
-        self.assertEqual([2, 2], covariance.get_shape())
-        self.assertAllClose(expected_covariance, covariance.eval())
-
-  def testCovarianceNAlphaBroadcast(self):
-    alpha_v = [1., 2, 3]
-    alpha_0 = 6.
-
-    # Shape [4, 3]
-    alpha = np.array(4 * [alpha_v], dtype=np.float32)
-    # Shape [4, 1]
-    ns = np.array([[2.], [3.], [4.], [5.]], dtype=np.float32)
-
-    variance_entry = lambda a, a_sum: a / a_sum * (1 - a / a_sum)
-    covariance_entry = lambda a, b, a_sum: -a * b / a_sum**2
-    # Shape [4, 3, 3]
-    shared_matrix = np.array(
-        4 * [[[
-            variance_entry(alpha_v[0], alpha_0),
-            covariance_entry(alpha_v[0], alpha_v[1], alpha_0),
-            covariance_entry(alpha_v[0], alpha_v[2], alpha_0)
-        ], [
-            covariance_entry(alpha_v[1], alpha_v[0], alpha_0),
-            variance_entry(alpha_v[1], alpha_0),
-            covariance_entry(alpha_v[1], alpha_v[2], alpha_0)
-        ], [
-            covariance_entry(alpha_v[2], alpha_v[0], alpha_0),
-            covariance_entry(alpha_v[2], alpha_v[1], alpha_0),
-            variance_entry(alpha_v[2], alpha_0)
-        ]]],
-        dtype=np.float32)
-
-    with self.test_session():
-      # ns is shape [4, 1], and alpha is shape [4, 3].
-      dist = ds.DirichletMultinomial(ns, alpha)
-      covariance = dist.covariance()
-      expected_covariance = shared_matrix * (
-          ns * (ns + alpha_0) / (1 + alpha_0))[..., array_ops.newaxis]
-
-      self.assertEqual([4, 3, 3], covariance.get_shape())
-      self.assertAllClose(expected_covariance, covariance.eval())
-
-  def testCovarianceMultidimensional(self):
-    alpha = np.random.rand(3, 5, 4).astype(np.float32)
-    alpha2 = np.random.rand(6, 3, 3).astype(np.float32)
-
-    ns = np.random.randint(low=1, high=11, size=[3, 5, 1]).astype(np.float32)
-    ns2 = np.random.randint(low=1, high=11, size=[6, 1, 1]).astype(np.float32)
-
-    with self.test_session():
-      dist = ds.DirichletMultinomial(ns, alpha)
-      dist2 = ds.DirichletMultinomial(ns2, alpha2)
-
-      covariance = dist.covariance()
-      covariance2 = dist2.covariance()
-      self.assertEqual([3, 5, 4, 4], covariance.get_shape())
-      self.assertEqual([6, 3, 3, 3], covariance2.get_shape())
-
-  def testZeroCountsResultsInPmfEqualToOne(self):
-    # There is only one way for zero items to be selected, and this happens with
-    # probability 1.
-    alpha = [5, 0.5]
-    counts = [0., 0]
-    with self.test_session():
-      dist = ds.DirichletMultinomial(0., alpha)
-      pmf = dist.prob(counts)
-      self.assertAllClose(1.0, pmf.eval())
-      self.assertEqual((), pmf.get_shape())
-
-  def testLargeTauGivesPreciseProbabilities(self):
-    # If tau is large, we are doing coin flips with probability mu.
-    mu = np.array([0.1, 0.1, 0.8], dtype=np.float32)
-    tau = np.array([100.], dtype=np.float32)
-    alpha = tau * mu
-
-    # One (three sided) coin flip.  Prob[coin 3] = 0.8.
-    # Note that since it was one flip, value of tau didn't matter.
-    counts = [0., 0, 1]
-    with self.test_session():
-      dist = ds.DirichletMultinomial(1., alpha)
-      pmf = dist.prob(counts)
-      self.assertAllClose(0.8, pmf.eval(), atol=1e-4)
-      self.assertEqual((), pmf.get_shape())
-
-    # Two (three sided) coin flips.  Prob[coin 3] = 0.8.
-    counts = [0., 0, 2]
-    with self.test_session():
-      dist = ds.DirichletMultinomial(2., alpha)
-      pmf = dist.prob(counts)
-      self.assertAllClose(0.8**2, pmf.eval(), atol=1e-2)
-      self.assertEqual((), pmf.get_shape())
-
-    # Three (three sided) coin flips.
-    counts = [1., 0, 2]
-    with self.test_session():
-      dist = ds.DirichletMultinomial(3., alpha)
-      pmf = dist.prob(counts)
-      self.assertAllClose(3 * 0.1 * 0.8 * 0.8, pmf.eval(), atol=1e-2)
-      self.assertEqual((), pmf.get_shape())
-
-  def testSmallTauPrefersCorrelatedResults(self):
-    # If tau is small, then correlation between draws is large, so draws that
-    # are both of the same class are more likely.
-    mu = np.array([0.5, 0.5], dtype=np.float32)
-    tau = np.array([0.1], dtype=np.float32)
-    alpha = tau * mu
-
-    # If there is only one draw, it is still a coin flip, even with small tau.
-    counts = [1., 0]
-    with self.test_session():
-      dist = ds.DirichletMultinomial(1., alpha)
-      pmf = dist.prob(counts)
-      self.assertAllClose(0.5, pmf.eval())
-      self.assertEqual((), pmf.get_shape())
-
-    # If there are two draws, it is much more likely that they are the same.
-    counts_same = [2., 0]
-    counts_different = [1, 1.]
-    with self.test_session():
-      dist = ds.DirichletMultinomial(2., alpha)
-      pmf_same = dist.prob(counts_same)
-      pmf_different = dist.prob(counts_different)
-      self.assertLess(5 * pmf_different.eval(), pmf_same.eval())
-      self.assertEqual((), pmf_same.get_shape())
-
-  def testNonStrictTurnsOffAllChecks(self):
-    # Make totally invalid input.
-    with self.test_session():
-      alpha = [[-1., 2]]  # alpha should be positive.
-      counts = [[1., 0], [0., -1]]  # counts should be non-negative.
-      n = [-5.3]  # n should be a non negative integer equal to counts.sum.
-      dist = ds.DirichletMultinomial(n, alpha, validate_args=False)
-      dist.prob(counts).eval()  # Should not raise.
-
-  def testSampleUnbiasedNonScalarBatch(self):
-    with self.test_session() as sess:
-      dist = ds.DirichletMultinomial(
-          total_count=5.,
-          concentration=1. + 2. * self._rng.rand(4, 3, 2).astype(np.float32))
-      n = int(3e3)
-      x = dist.sample(n, seed=0)
-      sample_mean = math_ops.reduce_mean(x, 0)
-      # Cyclically rotate event dims left.
-      x_centered = array_ops.transpose(x - sample_mean, [1, 2, 3, 0])
-      sample_covariance = math_ops.matmul(
-          x_centered, x_centered, adjoint_b=True) / n
-      [
-          sample_mean_,
-          sample_covariance_,
-          actual_mean_,
-          actual_covariance_,
-      ] = sess.run([
-          sample_mean,
-          sample_covariance,
-          dist.mean(),
-          dist.covariance(),
-      ])
-      self.assertAllEqual([4, 3, 2], sample_mean.get_shape())
-      self.assertAllClose(actual_mean_, sample_mean_, atol=0., rtol=0.15)
-      self.assertAllEqual([4, 3, 2, 2], sample_covariance.get_shape())
-      self.assertAllClose(
-          actual_covariance_, sample_covariance_, atol=0., rtol=0.20)
-
-  def testSampleUnbiasedScalarBatch(self):
-    with self.test_session() as sess:
-      dist = ds.DirichletMultinomial(
-          total_count=5.,
-          concentration=1. + 2. * self._rng.rand(4).astype(np.float32))
-      n = int(5e3)
-      x = dist.sample(n, seed=0)
-      sample_mean = math_ops.reduce_mean(x, 0)
-      x_centered = x - sample_mean  # Already transposed to [n, 2].
-      sample_covariance = math_ops.matmul(
-          x_centered, x_centered, adjoint_a=True) / n
-      [
-          sample_mean_,
-          sample_covariance_,
-          actual_mean_,
-          actual_covariance_,
-      ] = sess.run([
-          sample_mean,
-          sample_covariance,
-          dist.mean(),
-          dist.covariance(),
-      ])
-      self.assertAllEqual([4], sample_mean.get_shape())
-      self.assertAllClose(actual_mean_, sample_mean_, atol=0., rtol=0.05)
-      self.assertAllEqual([4, 4], sample_covariance.get_shape())
-      self.assertAllClose(
-          actual_covariance_, sample_covariance_, atol=0., rtol=0.15)
-
-
-if __name__ == "__main__":
-  test.main()
diff --git a/tensorflow/contrib/distributions/python/kernel_tests/dirichlet_test.py b/tensorflow/contrib/distributions/python/kernel_tests/dirichlet_test.py
deleted file mode 100644
index cd634da09d..0000000000
--- a/tensorflow/contrib/distributions/python/kernel_tests/dirichlet_test.py
+++ /dev/null
@@ -1,240 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-from scipy import stats
-from tensorflow.contrib.distributions.python.ops import dirichlet as dirichlet_lib
-from tensorflow.python.framework import constant_op
-from tensorflow.python.framework import tensor_shape
-from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import math_ops
-from tensorflow.python.platform import test
-
-
-class DirichletTest(test.TestCase):
-
-  def testSimpleShapes(self):
-    with self.test_session():
-      alpha = np.random.rand(3)
-      dist = dirichlet_lib.Dirichlet(alpha)
-      self.assertEqual(3, dist.event_shape_tensor().eval())
-      self.assertAllEqual([], dist.batch_shape_tensor().eval())
-      self.assertEqual(tensor_shape.TensorShape([3]), dist.event_shape)
-      self.assertEqual(tensor_shape.TensorShape([]), dist.batch_shape)
-
-  def testComplexShapes(self):
-    with self.test_session():
-      alpha = np.random.rand(3, 2, 2)
-      dist = dirichlet_lib.Dirichlet(alpha)
-      self.assertEqual(2, dist.event_shape_tensor().eval())
-      self.assertAllEqual([3, 2], dist.batch_shape_tensor().eval())
-      self.assertEqual(tensor_shape.TensorShape([2]), dist.event_shape)
-      self.assertEqual(tensor_shape.TensorShape([3, 2]), dist.batch_shape)
-
-  def testConcentrationProperty(self):
-    alpha = [[1., 2, 3]]
-    with self.test_session():
-      dist = dirichlet_lib.Dirichlet(alpha)
-      self.assertEqual([1, 3], dist.concentration.get_shape())
-      self.assertAllClose(alpha, dist.concentration.eval())
-
-  def testPdfXProper(self):
-    alpha = [[1., 2, 3]]
-    with self.test_session():
-      dist = dirichlet_lib.Dirichlet(alpha, validate_args=True)
-      dist.prob([.1, .3, .6]).eval()
-      dist.prob([.2, .3, .5]).eval()
-      # Either condition can trigger.
-      with self.assertRaisesOpError("samples must be positive"):
-        dist.prob([-1., 1.5, 0.5]).eval()
-      with self.assertRaisesOpError("samples must be positive"):
-        dist.prob([0., .1, .9]).eval()
-      with self.assertRaisesOpError(
-          "sample last-dimension must sum to `1`"):
-        dist.prob([.1, .2, .8]).eval()
-
-  def testPdfZeroBatches(self):
-    with self.test_session():
-      alpha = [1., 2]
-      x = [.5, .5]
-      dist = dirichlet_lib.Dirichlet(alpha)
-      pdf = dist.prob(x)
-      self.assertAllClose(1., pdf.eval())
-      self.assertEqual((), pdf.get_shape())
-
-  def testPdfZeroBatchesNontrivialX(self):
-    with self.test_session():
-      alpha = [1., 2]
-      x = [.3, .7]
-      dist = dirichlet_lib.Dirichlet(alpha)
-      pdf = dist.prob(x)
-      self.assertAllClose(7. / 5, pdf.eval())
-      self.assertEqual((), pdf.get_shape())
-
-  def testPdfUniformZeroBatches(self):
-    with self.test_session():
-      # Corresponds to a uniform distribution
-      alpha = [1., 1, 1]
-      x = [[.2, .5, .3], [.3, .4, .3]]
-      dist = dirichlet_lib.Dirichlet(alpha)
-      pdf = dist.prob(x)
-      self.assertAllClose([2., 2.], pdf.eval())
-      self.assertEqual((2), pdf.get_shape())
-
-  def testPdfAlphaStretchedInBroadcastWhenSameRank(self):
-    with self.test_session():
-      alpha = [[1., 2]]
-      x = [[.5, .5], [.3, .7]]
-      dist = dirichlet_lib.Dirichlet(alpha)
-      pdf = dist.prob(x)
-      self.assertAllClose([1., 7. / 5], pdf.eval())
-      self.assertEqual((2), pdf.get_shape())
-
-  def testPdfAlphaStretchedInBroadcastWhenLowerRank(self):
-    with self.test_session():
-      alpha = [1., 2]
-      x = [[.5, .5], [.2, .8]]
-      pdf = dirichlet_lib.Dirichlet(alpha).prob(x)
-      self.assertAllClose([1., 8. / 5], pdf.eval())
-      self.assertEqual((2), pdf.get_shape())
-
-  def testPdfXStretchedInBroadcastWhenSameRank(self):
-    with self.test_session():
-      alpha = [[1., 2], [2., 3]]
-      x = [[.5, .5]]
-      pdf = dirichlet_lib.Dirichlet(alpha).prob(x)
-      self.assertAllClose([1., 3. / 2], pdf.eval())
-      self.assertEqual((2), pdf.get_shape())
-
-  def testPdfXStretchedInBroadcastWhenLowerRank(self):
-    with self.test_session():
-      alpha = [[1., 2], [2., 3]]
-      x = [.5, .5]
-      pdf = dirichlet_lib.Dirichlet(alpha).prob(x)
-      self.assertAllClose([1., 3. / 2], pdf.eval())
-      self.assertEqual((2), pdf.get_shape())
-
-  def testMean(self):
-    with self.test_session():
-      alpha = [1., 2, 3]
-      expected_mean = stats.dirichlet.mean(alpha)
-      dirichlet = dirichlet_lib.Dirichlet(concentration=alpha)
-      self.assertEqual(dirichlet.mean().get_shape(), [3])
-      self.assertAllClose(dirichlet.mean().eval(), expected_mean)
-
-  def testCovarianceFromSampling(self):
-    alpha = np.array([[1., 2, 3],
-                      [2.5, 4, 0.01]], dtype=np.float32)
-    with self.test_session() as sess:
-      dist = dirichlet_lib.Dirichlet(alpha)  # batch_shape=[2], event_shape=[3]
-      x = dist.sample(int(250e3), seed=1)
-      sample_mean = math_ops.reduce_mean(x, 0)
-      x_centered = x - sample_mean[None, ...]
-      sample_cov = math_ops.reduce_mean(math_ops.matmul(
-          x_centered[..., None], x_centered[..., None, :]), 0)
-      sample_var = array_ops.matrix_diag_part(sample_cov)
-      sample_stddev = math_ops.sqrt(sample_var)
-      [
-          sample_mean_,
-          sample_cov_,
-          sample_var_,
-          sample_stddev_,
-          analytic_mean,
-          analytic_cov,
-          analytic_var,
-          analytic_stddev,
-      ] = sess.run([
-          sample_mean,
-          sample_cov,
-          sample_var,
-          sample_stddev,
-          dist.mean(),
-          dist.covariance(),
-          dist.variance(),
-          dist.stddev(),
-      ])
-      self.assertAllClose(sample_mean_, analytic_mean, atol=0., rtol=0.04)
-      self.assertAllClose(sample_cov_, analytic_cov, atol=0., rtol=0.06)
-      self.assertAllClose(sample_var_, analytic_var, atol=0., rtol=0.03)
-      self.assertAllClose(sample_stddev_, analytic_stddev, atol=0., rtol=0.02)
-
-  def testVariance(self):
-    with self.test_session():
-      alpha = [1., 2, 3]
-      denominator = np.sum(alpha)**2 * (np.sum(alpha) + 1)
-      expected_covariance = np.diag(stats.dirichlet.var(alpha))
-      expected_covariance += [[0., -2, -3], [-2, 0, -6],
-                              [-3, -6, 0]] / denominator
-      dirichlet = dirichlet_lib.Dirichlet(concentration=alpha)
-      self.assertEqual(dirichlet.covariance().get_shape(), (3, 3))
-      self.assertAllClose(dirichlet.covariance().eval(), expected_covariance)
-
-  def testMode(self):
-    with self.test_session():
-      alpha = np.array([1.1, 2, 3])
-      expected_mode = (alpha - 1) / (np.sum(alpha) - 3)
-      dirichlet = dirichlet_lib.Dirichlet(concentration=alpha)
-      self.assertEqual(dirichlet.mode().get_shape(), [3])
-      self.assertAllClose(dirichlet.mode().eval(), expected_mode)
-
-  def testModeInvalid(self):
-    with self.test_session():
-      alpha = np.array([1., 2, 3])
-      dirichlet = dirichlet_lib.Dirichlet(concentration=alpha,
-                                          allow_nan_stats=False)
-      with self.assertRaisesOpError("Condition x < y.*"):
-        dirichlet.mode().eval()
-
-  def testModeEnableAllowNanStats(self):
-    with self.test_session():
-      alpha = np.array([1., 2, 3])
-      dirichlet = dirichlet_lib.Dirichlet(concentration=alpha,
-                                          allow_nan_stats=True)
-      expected_mode = np.zeros_like(alpha) + np.nan
-
-      self.assertEqual(dirichlet.mode().get_shape(), [3])
-      self.assertAllClose(dirichlet.mode().eval(), expected_mode)
-
-  def testEntropy(self):
-    with self.test_session():
-      alpha = [1., 2, 3]
-      expected_entropy = stats.dirichlet.entropy(alpha)
-      dirichlet = dirichlet_lib.Dirichlet(concentration=alpha)
-      self.assertEqual(dirichlet.entropy().get_shape(), ())
-      self.assertAllClose(dirichlet.entropy().eval(), expected_entropy)
-
-  def testSample(self):
-    with self.test_session():
-      alpha = [1., 2]
-      dirichlet = dirichlet_lib.Dirichlet(alpha)
-      n = constant_op.constant(100000)
-      samples = dirichlet.sample(n)
-      sample_values = samples.eval()
-      self.assertEqual(sample_values.shape, (100000, 2))
-      self.assertTrue(np.all(sample_values > 0.0))
-      self.assertLess(
-          stats.kstest(
-              # Beta is a univariate distribution.
-              sample_values[:, 0],
-              stats.beta(
-                  a=1., b=2.).cdf)[0],
-          0.01)
-
-
-if __name__ == "__main__":
-  test.main()
diff --git a/tensorflow/contrib/distributions/python/kernel_tests/exponential_test.py b/tensorflow/contrib/distributions/python/kernel_tests/exponential_test.py
deleted file mode 100644
index 6171202413..0000000000
--- a/tensorflow/contrib/distributions/python/kernel_tests/exponential_test.py
+++ /dev/null
@@ -1,140 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for initializers."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-from scipy import stats
-from tensorflow.contrib.distributions.python.ops import exponential as exponential_lib
-from tensorflow.python.client import session
-from tensorflow.python.framework import constant_op
-from tensorflow.python.ops import nn_ops
-from tensorflow.python.platform import test
-
-
-class ExponentialTest(test.TestCase):
-
-  def testExponentialLogPDF(self):
-    with session.Session():
-      batch_size = 6
-      lam = constant_op.constant([2.0] * batch_size)
-      lam_v = 2.0
-      x = np.array([2.5, 2.5, 4.0, 0.1, 1.0, 2.0], dtype=np.float32)
-      exponential = exponential_lib.Exponential(rate=lam)
-      expected_log_pdf = stats.expon.logpdf(x, scale=1 / lam_v)
-
-      log_pdf = exponential.log_prob(x)
-      self.assertEqual(log_pdf.get_shape(), (6,))
-      self.assertAllClose(log_pdf.eval(), expected_log_pdf)
-
-      pdf = exponential.prob(x)
-      self.assertEqual(pdf.get_shape(), (6,))
-      self.assertAllClose(pdf.eval(), np.exp(expected_log_pdf))
-
-  def testExponentialCDF(self):
-    with session.Session():
-      batch_size = 6
-      lam = constant_op.constant([2.0] * batch_size)
-      lam_v = 2.0
-      x = np.array([2.5, 2.5, 4.0, 0.1, 1.0, 2.0], dtype=np.float32)
-
-      exponential = exponential_lib.Exponential(rate=lam)
-      expected_cdf = stats.expon.cdf(x, scale=1 / lam_v)
-
-      cdf = exponential.cdf(x)
-      self.assertEqual(cdf.get_shape(), (6,))
-      self.assertAllClose(cdf.eval(), expected_cdf)
-
-  def testExponentialMean(self):
-    with session.Session():
-      lam_v = np.array([1.0, 4.0, 2.5])
-      expected_mean = stats.expon.mean(scale=1 / lam_v)
-      exponential = exponential_lib.Exponential(rate=lam_v)
-      self.assertEqual(exponential.mean().get_shape(), (3,))
-      self.assertAllClose(exponential.mean().eval(), expected_mean)
-
-  def testExponentialVariance(self):
-    with session.Session():
-      lam_v = np.array([1.0, 4.0, 2.5])
-      expected_variance = stats.expon.var(scale=1 / lam_v)
-      exponential = exponential_lib.Exponential(rate=lam_v)
-      self.assertEqual(exponential.variance().get_shape(), (3,))
-      self.assertAllClose(exponential.variance().eval(), expected_variance)
-
-  def testExponentialEntropy(self):
-    with session.Session():
-      lam_v = np.array([1.0, 4.0, 2.5])
-      expected_entropy = stats.expon.entropy(scale=1 / lam_v)
-      exponential = exponential_lib.Exponential(rate=lam_v)
-      self.assertEqual(exponential.entropy().get_shape(), (3,))
-      self.assertAllClose(exponential.entropy().eval(), expected_entropy)
-
-  def testExponentialSample(self):
-    with self.test_session():
-      lam = constant_op.constant([3.0, 4.0])
-      lam_v = [3.0, 4.0]
-      n = constant_op.constant(100000)
-      exponential = exponential_lib.Exponential(rate=lam)
-
-      samples = exponential.sample(n, seed=137)
-      sample_values = samples.eval()
-      self.assertEqual(sample_values.shape, (100000, 2))
-      self.assertFalse(np.any(sample_values < 0.0))
-      for i in range(2):
-        self.assertLess(
-            stats.kstest(
-                sample_values[:, i], stats.expon(scale=1.0 / lam_v[i]).cdf)[0],
-            0.01)
-
-  def testExponentialSampleMultiDimensional(self):
-    with self.test_session():
-      batch_size = 2
-      lam_v = [3.0, 22.0]
-      lam = constant_op.constant([lam_v] * batch_size)
-
-      exponential = exponential_lib.Exponential(rate=lam)
-
-      n = 100000
-      samples = exponential.sample(n, seed=138)
-      self.assertEqual(samples.get_shape(), (n, batch_size, 2))
-
-      sample_values = samples.eval()
-
-      self.assertFalse(np.any(sample_values < 0.0))
-      for i in range(2):
-        self.assertLess(
-            stats.kstest(
-                sample_values[:, 0, i],
-                stats.expon(scale=1.0 / lam_v[i]).cdf)[0],
-            0.01)
-        self.assertLess(
-            stats.kstest(
-                sample_values[:, 1, i],
-                stats.expon(scale=1.0 / lam_v[i]).cdf)[0],
-            0.01)
-
-  def testExponentialWithSoftplusRate(self):
-    with self.test_session():
-      lam = [-2.2, -3.4]
-      exponential = exponential_lib.ExponentialWithSoftplusRate(rate=lam)
-      self.assertAllClose(nn_ops.softplus(lam).eval(),
-                          exponential.rate.eval())
-
-
-if __name__ == "__main__":
-  test.main()
diff --git a/tensorflow/contrib/distributions/python/kernel_tests/gamma_test.py b/tensorflow/contrib/distributions/python/kernel_tests/gamma_test.py
deleted file mode 100644
index 5ccf2308a5..0000000000
--- a/tensorflow/contrib/distributions/python/kernel_tests/gamma_test.py
+++ /dev/null
@@ -1,366 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-from scipy import special
-from scipy import stats
-
-from tensorflow.contrib.distributions.python.ops import gamma as gamma_lib
-from tensorflow.python.client import session
-from tensorflow.python.framework import constant_op
-from tensorflow.python.framework import tensor_shape
-from tensorflow.python.ops import math_ops
-from tensorflow.python.ops import nn_ops
-from tensorflow.python.ops.distributions import kullback_leibler
-from tensorflow.python.platform import test
-
-
-class GammaTest(test.TestCase):
-
-  def testGammaShape(self):
-    with self.test_session():
-      alpha = constant_op.constant([3.0] * 5)
-      beta = constant_op.constant(11.0)
-      gamma = gamma_lib.Gamma(concentration=alpha, rate=beta)
-
-      self.assertEqual(gamma.batch_shape_tensor().eval(), (5,))
-      self.assertEqual(gamma.batch_shape, tensor_shape.TensorShape([5]))
-      self.assertAllEqual(gamma.event_shape_tensor().eval(), [])
-      self.assertEqual(gamma.event_shape, tensor_shape.TensorShape([]))
-
-  def testGammaLogPDF(self):
-    with self.test_session():
-      batch_size = 6
-      alpha = constant_op.constant([2.0] * batch_size)
-      beta = constant_op.constant([3.0] * batch_size)
-      alpha_v = 2.0
-      beta_v = 3.0
-      x = np.array([2.5, 2.5, 4.0, 0.1, 1.0, 2.0], dtype=np.float32)
-      gamma = gamma_lib.Gamma(concentration=alpha, rate=beta)
-      expected_log_pdf = stats.gamma.logpdf(x, alpha_v, scale=1 / beta_v)
-      log_pdf = gamma.log_prob(x)
-      self.assertEqual(log_pdf.get_shape(), (6,))
-      self.assertAllClose(log_pdf.eval(), expected_log_pdf)
-
-      pdf = gamma.prob(x)
-      self.assertEqual(pdf.get_shape(), (6,))
-      self.assertAllClose(pdf.eval(), np.exp(expected_log_pdf))
-
-  def testGammaLogPDFMultidimensional(self):
-    with self.test_session():
-      batch_size = 6
-      alpha = constant_op.constant([[2.0, 4.0]] * batch_size)
-      beta = constant_op.constant([[3.0, 4.0]] * batch_size)
-      alpha_v = np.array([2.0, 4.0])
-      beta_v = np.array([3.0, 4.0])
-      x = np.array([[2.5, 2.5, 4.0, 0.1, 1.0, 2.0]], dtype=np.float32).T
-      gamma = gamma_lib.Gamma(concentration=alpha, rate=beta)
-      expected_log_pdf = stats.gamma.logpdf(x, alpha_v, scale=1 / beta_v)
-      log_pdf = gamma.log_prob(x)
-      log_pdf_values = log_pdf.eval()
-      self.assertEqual(log_pdf.get_shape(), (6, 2))
-      self.assertAllClose(log_pdf_values, expected_log_pdf)
-
-      pdf = gamma.prob(x)
-      pdf_values = pdf.eval()
-      self.assertEqual(pdf.get_shape(), (6, 2))
-      self.assertAllClose(pdf_values, np.exp(expected_log_pdf))
-
-  def testGammaLogPDFMultidimensionalBroadcasting(self):
-    with self.test_session():
-      batch_size = 6
-      alpha = constant_op.constant([[2.0, 4.0]] * batch_size)
-      beta = constant_op.constant(3.0)
-      alpha_v = np.array([2.0, 4.0])
-      beta_v = 3.0
-      x = np.array([[2.5, 2.5, 4.0, 0.1, 1.0, 2.0]], dtype=np.float32).T
-      gamma = gamma_lib.Gamma(concentration=alpha, rate=beta)
-      expected_log_pdf = stats.gamma.logpdf(x, alpha_v, scale=1 / beta_v)
-      log_pdf = gamma.log_prob(x)
-      log_pdf_values = log_pdf.eval()
-      self.assertEqual(log_pdf.get_shape(), (6, 2))
-      self.assertAllClose(log_pdf_values, expected_log_pdf)
-
-      pdf = gamma.prob(x)
-      pdf_values = pdf.eval()
-      self.assertEqual(pdf.get_shape(), (6, 2))
-      self.assertAllClose(pdf_values, np.exp(expected_log_pdf))
-
-  def testGammaCDF(self):
-    with self.test_session():
-      batch_size = 6
-      alpha = constant_op.constant([2.0] * batch_size)
-      beta = constant_op.constant([3.0] * batch_size)
-      alpha_v = 2.0
-      beta_v = 3.0
-      x = np.array([2.5, 2.5, 4.0, 0.1, 1.0, 2.0], dtype=np.float32)
-
-      gamma = gamma_lib.Gamma(concentration=alpha, rate=beta)
-      expected_cdf = stats.gamma.cdf(x, alpha_v, scale=1 / beta_v)
-
-      cdf = gamma.cdf(x)
-      self.assertEqual(cdf.get_shape(), (6,))
-      self.assertAllClose(cdf.eval(), expected_cdf)
-
-  def testGammaMean(self):
-    with self.test_session():
-      alpha_v = np.array([1.0, 3.0, 2.5])
-      beta_v = np.array([1.0, 4.0, 5.0])
-      gamma = gamma_lib.Gamma(concentration=alpha_v, rate=beta_v)
-      expected_means = stats.gamma.mean(alpha_v, scale=1 / beta_v)
-      self.assertEqual(gamma.mean().get_shape(), (3,))
-      self.assertAllClose(gamma.mean().eval(), expected_means)
-
-  def testGammaModeAllowNanStatsIsFalseWorksWhenAllBatchMembersAreDefined(self):
-    with self.test_session():
-      alpha_v = np.array([5.5, 3.0, 2.5])
-      beta_v = np.array([1.0, 4.0, 5.0])
-      gamma = gamma_lib.Gamma(concentration=alpha_v, rate=beta_v)
-      expected_modes = (alpha_v - 1) / beta_v
-      self.assertEqual(gamma.mode().get_shape(), (3,))
-      self.assertAllClose(gamma.mode().eval(), expected_modes)
-
-  def testGammaModeAllowNanStatsFalseRaisesForUndefinedBatchMembers(self):
-    with self.test_session():
-      # Mode will not be defined for the first entry.
-      alpha_v = np.array([0.5, 3.0, 2.5])
-      beta_v = np.array([1.0, 4.0, 5.0])
-      gamma = gamma_lib.Gamma(concentration=alpha_v,
-                              rate=beta_v,
-                              allow_nan_stats=False)
-      with self.assertRaisesOpError("x < y"):
-        gamma.mode().eval()
-
-  def testGammaModeAllowNanStatsIsTrueReturnsNaNforUndefinedBatchMembers(self):
-    with self.test_session():
-      # Mode will not be defined for the first entry.
-      alpha_v = np.array([0.5, 3.0, 2.5])
-      beta_v = np.array([1.0, 4.0, 5.0])
-      gamma = gamma_lib.Gamma(concentration=alpha_v,
-                              rate=beta_v,
-                              allow_nan_stats=True)
-      expected_modes = (alpha_v - 1) / beta_v
-      expected_modes[0] = np.nan
-      self.assertEqual(gamma.mode().get_shape(), (3,))
-      self.assertAllClose(gamma.mode().eval(), expected_modes)
-
-  def testGammaVariance(self):
-    with self.test_session():
-      alpha_v = np.array([1.0, 3.0, 2.5])
-      beta_v = np.array([1.0, 4.0, 5.0])
-      gamma = gamma_lib.Gamma(concentration=alpha_v, rate=beta_v)
-      expected_variances = stats.gamma.var(alpha_v, scale=1 / beta_v)
-      self.assertEqual(gamma.variance().get_shape(), (3,))
-      self.assertAllClose(gamma.variance().eval(), expected_variances)
-
-  def testGammaStd(self):
-    with self.test_session():
-      alpha_v = np.array([1.0, 3.0, 2.5])
-      beta_v = np.array([1.0, 4.0, 5.0])
-      gamma = gamma_lib.Gamma(concentration=alpha_v, rate=beta_v)
-      expected_stddev = stats.gamma.std(alpha_v, scale=1. / beta_v)
-      self.assertEqual(gamma.stddev().get_shape(), (3,))
-      self.assertAllClose(gamma.stddev().eval(), expected_stddev)
-
-  def testGammaEntropy(self):
-    with self.test_session():
-      alpha_v = np.array([1.0, 3.0, 2.5])
-      beta_v = np.array([1.0, 4.0, 5.0])
-      expected_entropy = stats.gamma.entropy(alpha_v, scale=1 / beta_v)
-      gamma = gamma_lib.Gamma(concentration=alpha_v, rate=beta_v)
-      self.assertEqual(gamma.entropy().get_shape(), (3,))
-      self.assertAllClose(gamma.entropy().eval(), expected_entropy)
-
-  def testGammaSampleSmallAlpha(self):
-    with session.Session():
-      alpha_v = 0.05
-      beta_v = 1.0
-      alpha = constant_op.constant(alpha_v)
-      beta = constant_op.constant(beta_v)
-      n = 100000
-      gamma = gamma_lib.Gamma(concentration=alpha, rate=beta)
-      samples = gamma.sample(n, seed=137)
-      sample_values = samples.eval()
-      self.assertEqual(samples.get_shape(), (n,))
-      self.assertEqual(sample_values.shape, (n,))
-      self.assertAllClose(
-          sample_values.mean(),
-          stats.gamma.mean(
-              alpha_v, scale=1 / beta_v),
-          atol=.01)
-      self.assertAllClose(
-          sample_values.var(),
-          stats.gamma.var(alpha_v, scale=1 / beta_v),
-          atol=.15)
-      self.assertTrue(self._kstest(alpha_v, beta_v, sample_values))
-
-  def testGammaSample(self):
-    with session.Session():
-      alpha_v = 4.0
-      beta_v = 3.0
-      alpha = constant_op.constant(alpha_v)
-      beta = constant_op.constant(beta_v)
-      n = 100000
-      gamma = gamma_lib.Gamma(concentration=alpha, rate=beta)
-      samples = gamma.sample(n, seed=137)
-      sample_values = samples.eval()
-      self.assertEqual(samples.get_shape(), (n,))
-      self.assertEqual(sample_values.shape, (n,))
-      self.assertAllClose(
-          sample_values.mean(),
-          stats.gamma.mean(
-              alpha_v, scale=1 / beta_v),
-          atol=.01)
-      self.assertAllClose(
-          sample_values.var(),
-          stats.gamma.var(alpha_v, scale=1 / beta_v),
-          atol=.15)
-      self.assertTrue(self._kstest(alpha_v, beta_v, sample_values))
-
-  def testGammaSampleMultiDimensional(self):
-    with session.Session():
-      alpha_v = np.array([np.arange(1, 101, dtype=np.float32)])  # 1 x 100
-      beta_v = np.array([np.arange(1, 11, dtype=np.float32)]).T  # 10 x 1
-      gamma = gamma_lib.Gamma(concentration=alpha_v, rate=beta_v)
-      n = 10000
-      samples = gamma.sample(n, seed=137)
-      sample_values = samples.eval()
-      self.assertEqual(samples.get_shape(), (n, 10, 100))
-      self.assertEqual(sample_values.shape, (n, 10, 100))
-      zeros = np.zeros_like(alpha_v + beta_v)  # 10 x 100
-      alpha_bc = alpha_v + zeros
-      beta_bc = beta_v + zeros
-      self.assertAllClose(
-          sample_values.mean(axis=0),
-          stats.gamma.mean(
-              alpha_bc, scale=1 / beta_bc),
-          rtol=.035)
-      self.assertAllClose(
-          sample_values.var(axis=0),
-          stats.gamma.var(alpha_bc, scale=1 / beta_bc),
-          atol=4.5)
-      fails = 0
-      trials = 0
-      for ai, a in enumerate(np.reshape(alpha_v, [-1])):
-        for bi, b in enumerate(np.reshape(beta_v, [-1])):
-          s = sample_values[:, bi, ai]
-          trials += 1
-          fails += 0 if self._kstest(a, b, s) else 1
-      self.assertLess(fails, trials * 0.03)
-
-  def _kstest(self, alpha, beta, samples):
-    # Uses the Kolmogorov-Smirnov test for goodness of fit.
-    ks, _ = stats.kstest(samples, stats.gamma(alpha, scale=1 / beta).cdf)
-    # Return True when the test passes.
-    return ks < 0.02
-
-  def testGammaPdfOfSampleMultiDims(self):
-    with session.Session() as sess:
-      gamma = gamma_lib.Gamma(concentration=[7., 11.], rate=[[5.], [6.]])
-      num = 50000
-      samples = gamma.sample(num, seed=137)
-      pdfs = gamma.prob(samples)
-      sample_vals, pdf_vals = sess.run([samples, pdfs])
-      self.assertEqual(samples.get_shape(), (num, 2, 2))
-      self.assertEqual(pdfs.get_shape(), (num, 2, 2))
-      self.assertAllClose(
-          stats.gamma.mean(
-              [[7., 11.], [7., 11.]], scale=1 / np.array([[5., 5.], [6., 6.]])),
-          sample_vals.mean(axis=0),
-          atol=.1)
-      self.assertAllClose(
-          stats.gamma.var([[7., 11.], [7., 11.]],
-                          scale=1 / np.array([[5., 5.], [6., 6.]])),
-          sample_vals.var(axis=0),
-          atol=.1)
-      self._assertIntegral(sample_vals[:, 0, 0], pdf_vals[:, 0, 0], err=0.02)
-      self._assertIntegral(sample_vals[:, 0, 1], pdf_vals[:, 0, 1], err=0.02)
-      self._assertIntegral(sample_vals[:, 1, 0], pdf_vals[:, 1, 0], err=0.02)
-      self._assertIntegral(sample_vals[:, 1, 1], pdf_vals[:, 1, 1], err=0.02)
-
-  def _assertIntegral(self, sample_vals, pdf_vals, err=1e-3):
-    s_p = zip(sample_vals, pdf_vals)
-    prev = (0, 0)
-    total = 0
-    for k in sorted(s_p, key=lambda x: x[0]):
-      pair_pdf = (k[1] + prev[1]) / 2
-      total += (k[0] - prev[0]) * pair_pdf
-      prev = k
-    self.assertNear(1., total, err=err)
-
-  def testGammaNonPositiveInitializationParamsRaises(self):
-    with self.test_session():
-      alpha_v = constant_op.constant(0.0, name="alpha")
-      beta_v = constant_op.constant(1.0, name="beta")
-      gamma = gamma_lib.Gamma(concentration=alpha_v,
-                              rate=beta_v,
-                              validate_args=True)
-      with self.assertRaisesOpError("alpha"):
-        gamma.mean().eval()
-      alpha_v = constant_op.constant(1.0, name="alpha")
-      beta_v = constant_op.constant(0.0, name="beta")
-      gamma = gamma_lib.Gamma(concentration=alpha_v,
-                              rate=beta_v,
-                              validate_args=True)
-      with self.assertRaisesOpError("beta"):
-        gamma.mean().eval()
-
-  def testGammaWithSoftplusConcentrationRate(self):
-    with self.test_session():
-      alpha_v = constant_op.constant([0.0, -2.1], name="alpha")
-      beta_v = constant_op.constant([1.0, -3.6], name="beta")
-      gamma = gamma_lib.GammaWithSoftplusConcentrationRate(
-          concentration=alpha_v, rate=beta_v)
-      self.assertAllEqual(nn_ops.softplus(alpha_v).eval(),
-                          gamma.concentration.eval())
-      self.assertAllEqual(nn_ops.softplus(beta_v).eval(),
-                          gamma.rate.eval())
-
-  def testGammaGammaKL(self):
-    alpha0 = np.array([3.])
-    beta0 = np.array([1., 2., 3., 1.5, 2.5, 3.5])
-
-    alpha1 = np.array([0.4])
-    beta1 = np.array([0.5, 1., 1.5, 2., 2.5, 3.])
-
-    # Build graph.
-    with self.test_session() as sess:
-      g0 = gamma_lib.Gamma(concentration=alpha0, rate=beta0)
-      g1 = gamma_lib.Gamma(concentration=alpha1, rate=beta1)
-      x = g0.sample(int(1e4), seed=0)
-      kl_sample = math_ops.reduce_mean(g0.log_prob(x) - g1.log_prob(x), 0)
-      kl_actual = kullback_leibler.kl_divergence(g0, g1)
-
-    # Execute graph.
-    [kl_sample_, kl_actual_] = sess.run([kl_sample, kl_actual])
-
-    kl_expected = ((alpha0 - alpha1) * special.digamma(alpha0)
-                   + special.gammaln(alpha1)
-                   - special.gammaln(alpha0)
-                   + alpha1 * np.log(beta0)
-                   - alpha1 * np.log(beta1)
-                   + alpha0 * (beta1 / beta0 - 1.))
-
-    self.assertEqual(beta0.shape, kl_actual.get_shape())
-    self.assertAllClose(kl_expected, kl_actual_, atol=0., rtol=1e-6)
-    self.assertAllClose(kl_sample_, kl_actual_, atol=0., rtol=1e-2)
-
-
-if __name__ == "__main__":
-  test.main()
diff --git a/tensorflow/contrib/distributions/python/kernel_tests/laplace_test.py b/tensorflow/contrib/distributions/python/kernel_tests/laplace_test.py
deleted file mode 100644
index 1f58d495f0..0000000000
--- a/tensorflow/contrib/distributions/python/kernel_tests/laplace_test.py
+++ /dev/null
@@ -1,318 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-from scipy import stats
-from tensorflow.contrib.distributions.python.ops import laplace as laplace_lib
-from tensorflow.python.client import session
-from tensorflow.python.framework import constant_op
-from tensorflow.python.framework import tensor_shape
-from tensorflow.python.ops import nn_ops
-from tensorflow.python.platform import test
-
-
-class LaplaceTest(test.TestCase):
-
-  def testLaplaceShape(self):
-    with self.test_session():
-      loc = constant_op.constant([3.0] * 5)
-      scale = constant_op.constant(11.0)
-      laplace = laplace_lib.Laplace(loc=loc, scale=scale)
-
-      self.assertEqual(laplace.batch_shape_tensor().eval(), (5,))
-      self.assertEqual(laplace.batch_shape, tensor_shape.TensorShape([5]))
-      self.assertAllEqual(laplace.event_shape_tensor().eval(), [])
-      self.assertEqual(laplace.event_shape, tensor_shape.TensorShape([]))
-
-  def testLaplaceLogPDF(self):
-    with self.test_session():
-      batch_size = 6
-      loc = constant_op.constant([2.0] * batch_size)
-      scale = constant_op.constant([3.0] * batch_size)
-      loc_v = 2.0
-      scale_v = 3.0
-      x = np.array([2.5, 2.5, 4.0, 0.1, 1.0, 2.0], dtype=np.float32)
-      laplace = laplace_lib.Laplace(loc=loc, scale=scale)
-      expected_log_pdf = stats.laplace.logpdf(x, loc_v, scale=scale_v)
-      log_pdf = laplace.log_prob(x)
-      self.assertEqual(log_pdf.get_shape(), (6,))
-      self.assertAllClose(log_pdf.eval(), expected_log_pdf)
-
-      pdf = laplace.prob(x)
-      self.assertEqual(pdf.get_shape(), (6,))
-      self.assertAllClose(pdf.eval(), np.exp(expected_log_pdf))
-
-  def testLaplaceLogPDFMultidimensional(self):
-    with self.test_session():
-      batch_size = 6
-      loc = constant_op.constant([[2.0, 4.0]] * batch_size)
-      scale = constant_op.constant([[3.0, 4.0]] * batch_size)
-      loc_v = np.array([2.0, 4.0])
-      scale_v = np.array([3.0, 4.0])
-      x = np.array([[2.5, 2.5, 4.0, 0.1, 1.0, 2.0]], dtype=np.float32).T
-      laplace = laplace_lib.Laplace(loc=loc, scale=scale)
-      expected_log_pdf = stats.laplace.logpdf(x, loc_v, scale=scale_v)
-      log_pdf = laplace.log_prob(x)
-      log_pdf_values = log_pdf.eval()
-      self.assertEqual(log_pdf.get_shape(), (6, 2))
-      self.assertAllClose(log_pdf_values, expected_log_pdf)
-
-      pdf = laplace.prob(x)
-      pdf_values = pdf.eval()
-      self.assertEqual(pdf.get_shape(), (6, 2))
-      self.assertAllClose(pdf_values, np.exp(expected_log_pdf))
-
-  def testLaplaceLogPDFMultidimensionalBroadcasting(self):
-    with self.test_session():
-      batch_size = 6
-      loc = constant_op.constant([[2.0, 4.0]] * batch_size)
-      scale = constant_op.constant(3.0)
-      loc_v = np.array([2.0, 4.0])
-      scale_v = 3.0
-      x = np.array([[2.5, 2.5, 4.0, 0.1, 1.0, 2.0]], dtype=np.float32).T
-      laplace = laplace_lib.Laplace(loc=loc, scale=scale)
-      expected_log_pdf = stats.laplace.logpdf(x, loc_v, scale=scale_v)
-      log_pdf = laplace.log_prob(x)
-      log_pdf_values = log_pdf.eval()
-      self.assertEqual(log_pdf.get_shape(), (6, 2))
-      self.assertAllClose(log_pdf_values, expected_log_pdf)
-
-      pdf = laplace.prob(x)
-      pdf_values = pdf.eval()
-      self.assertEqual(pdf.get_shape(), (6, 2))
-      self.assertAllClose(pdf_values, np.exp(expected_log_pdf))
-
-  def testLaplaceCDF(self):
-    with self.test_session():
-      batch_size = 6
-      loc = constant_op.constant([2.0] * batch_size)
-      scale = constant_op.constant([3.0] * batch_size)
-      loc_v = 2.0
-      scale_v = 3.0
-      x = np.array([2.5, 2.5, 4.0, 0.1, 1.0, 2.0], dtype=np.float32)
-
-      laplace = laplace_lib.Laplace(loc=loc, scale=scale)
-      expected_cdf = stats.laplace.cdf(x, loc_v, scale=scale_v)
-
-      cdf = laplace.cdf(x)
-      self.assertEqual(cdf.get_shape(), (6,))
-      self.assertAllClose(cdf.eval(), expected_cdf)
-
-  def testLaplaceLogCDF(self):
-    with self.test_session():
-      batch_size = 6
-      loc = constant_op.constant([2.0] * batch_size)
-      scale = constant_op.constant([3.0] * batch_size)
-      loc_v = 2.0
-      scale_v = 3.0
-      x = np.array([-2.5, 2.5, -4.0, 0.1, 1.0, 2.0], dtype=np.float32)
-
-      laplace = laplace_lib.Laplace(loc=loc, scale=scale)
-      expected_cdf = stats.laplace.logcdf(x, loc_v, scale=scale_v)
-
-      cdf = laplace.log_cdf(x)
-      self.assertEqual(cdf.get_shape(), (6,))
-      self.assertAllClose(cdf.eval(), expected_cdf)
-
-  def testLaplaceLogSurvivalFunction(self):
-    with self.test_session():
-      batch_size = 6
-      loc = constant_op.constant([2.0] * batch_size)
-      scale = constant_op.constant([3.0] * batch_size)
-      loc_v = 2.0
-      scale_v = 3.0
-      x = np.array([-2.5, 2.5, -4.0, 0.1, 1.0, 2.0], dtype=np.float32)
-
-      laplace = laplace_lib.Laplace(loc=loc, scale=scale)
-      expected_sf = stats.laplace.logsf(x, loc_v, scale=scale_v)
-
-      sf = laplace.log_survival_function(x)
-      self.assertEqual(sf.get_shape(), (6,))
-      self.assertAllClose(sf.eval(), expected_sf)
-
-  def testLaplaceMean(self):
-    with self.test_session():
-      loc_v = np.array([1.0, 3.0, 2.5])
-      scale_v = np.array([1.0, 4.0, 5.0])
-      laplace = laplace_lib.Laplace(loc=loc_v, scale=scale_v)
-      expected_means = stats.laplace.mean(loc_v, scale=scale_v)
-      self.assertEqual(laplace.mean().get_shape(), (3,))
-      self.assertAllClose(laplace.mean().eval(), expected_means)
-
-  def testLaplaceMode(self):
-    with self.test_session():
-      loc_v = np.array([0.5, 3.0, 2.5])
-      scale_v = np.array([1.0, 4.0, 5.0])
-      laplace = laplace_lib.Laplace(loc=loc_v, scale=scale_v)
-      self.assertEqual(laplace.mode().get_shape(), (3,))
-      self.assertAllClose(laplace.mode().eval(), loc_v)
-
-  def testLaplaceVariance(self):
-    with self.test_session():
-      loc_v = np.array([1.0, 3.0, 2.5])
-      scale_v = np.array([1.0, 4.0, 5.0])
-      laplace = laplace_lib.Laplace(loc=loc_v, scale=scale_v)
-      expected_variances = stats.laplace.var(loc_v, scale=scale_v)
-      self.assertEqual(laplace.variance().get_shape(), (3,))
-      self.assertAllClose(laplace.variance().eval(), expected_variances)
-
-  def testLaplaceStd(self):
-    with self.test_session():
-      loc_v = np.array([1.0, 3.0, 2.5])
-      scale_v = np.array([1.0, 4.0, 5.0])
-      laplace = laplace_lib.Laplace(loc=loc_v, scale=scale_v)
-      expected_stddev = stats.laplace.std(loc_v, scale=scale_v)
-      self.assertEqual(laplace.stddev().get_shape(), (3,))
-      self.assertAllClose(laplace.stddev().eval(), expected_stddev)
-
-  def testLaplaceEntropy(self):
-    with self.test_session():
-      loc_v = np.array([1.0, 3.0, 2.5])
-      scale_v = np.array([1.0, 4.0, 5.0])
-      expected_entropy = stats.laplace.entropy(loc_v, scale=scale_v)
-      laplace = laplace_lib.Laplace(loc=loc_v, scale=scale_v)
-      self.assertEqual(laplace.entropy().get_shape(), (3,))
-      self.assertAllClose(laplace.entropy().eval(), expected_entropy)
-
-  def testLaplaceSample(self):
-    with session.Session():
-      loc_v = 4.0
-      scale_v = 3.0
-      loc = constant_op.constant(loc_v)
-      scale = constant_op.constant(scale_v)
-      n = 100000
-      laplace = laplace_lib.Laplace(loc=loc, scale=scale)
-      samples = laplace.sample(n, seed=137)
-      sample_values = samples.eval()
-      self.assertEqual(samples.get_shape(), (n,))
-      self.assertEqual(sample_values.shape, (n,))
-      self.assertAllClose(
-          sample_values.mean(),
-          stats.laplace.mean(
-              loc_v, scale=scale_v),
-          rtol=0.05,
-          atol=0.)
-      self.assertAllClose(
-          sample_values.var(),
-          stats.laplace.var(loc_v, scale=scale_v),
-          rtol=0.05,
-          atol=0.)
-      self.assertTrue(self._kstest(loc_v, scale_v, sample_values))
-
-  def testLaplaceSampleMultiDimensional(self):
-    with session.Session():
-      loc_v = np.array([np.arange(1, 101, dtype=np.float32)])  # 1 x 100
-      scale_v = np.array([np.arange(1, 11, dtype=np.float32)]).T  # 10 x 1
-      laplace = laplace_lib.Laplace(loc=loc_v, scale=scale_v)
-      n = 10000
-      samples = laplace.sample(n, seed=137)
-      sample_values = samples.eval()
-      self.assertEqual(samples.get_shape(), (n, 10, 100))
-      self.assertEqual(sample_values.shape, (n, 10, 100))
-      zeros = np.zeros_like(loc_v + scale_v)  # 10 x 100
-      loc_bc = loc_v + zeros
-      scale_bc = scale_v + zeros
-      self.assertAllClose(
-          sample_values.mean(axis=0),
-          stats.laplace.mean(
-              loc_bc, scale=scale_bc),
-          rtol=0.35,
-          atol=0.)
-      self.assertAllClose(
-          sample_values.var(axis=0),
-          stats.laplace.var(loc_bc, scale=scale_bc),
-          rtol=0.10,
-          atol=0.)
-      fails = 0
-      trials = 0
-      for ai, a in enumerate(np.reshape(loc_v, [-1])):
-        for bi, b in enumerate(np.reshape(scale_v, [-1])):
-          s = sample_values[:, bi, ai]
-          trials += 1
-          fails += 0 if self._kstest(a, b, s) else 1
-      self.assertLess(fails, trials * 0.03)
-
-  def _kstest(self, loc, scale, samples):
-    # Uses the Kolmogorov-Smirnov test for goodness of fit.
-    ks, _ = stats.kstest(samples, stats.laplace(loc, scale=scale).cdf)
-    # Return True when the test passes.
-    return ks < 0.02
-
-  def testLaplacePdfOfSampleMultiDims(self):
-    with session.Session() as sess:
-      laplace = laplace_lib.Laplace(loc=[7., 11.], scale=[[5.], [6.]])
-      num = 50000
-      samples = laplace.sample(num, seed=137)
-      pdfs = laplace.prob(samples)
-      sample_vals, pdf_vals = sess.run([samples, pdfs])
-      self.assertEqual(samples.get_shape(), (num, 2, 2))
-      self.assertEqual(pdfs.get_shape(), (num, 2, 2))
-      self.assertAllClose(
-          stats.laplace.mean(
-              [[7., 11.], [7., 11.]], scale=np.array([[5., 5.], [6., 6.]])),
-          sample_vals.mean(axis=0),
-          rtol=0.05,
-          atol=0.)
-      self.assertAllClose(
-          stats.laplace.var([[7., 11.], [7., 11.]],
-                            scale=np.array([[5., 5.], [6., 6.]])),
-          sample_vals.var(axis=0),
-          rtol=0.05,
-          atol=0.)
-      self._assertIntegral(sample_vals[:, 0, 0], pdf_vals[:, 0, 0], err=0.02)
-      self._assertIntegral(sample_vals[:, 0, 1], pdf_vals[:, 0, 1], err=0.02)
-      self._assertIntegral(sample_vals[:, 1, 0], pdf_vals[:, 1, 0], err=0.02)
-      self._assertIntegral(sample_vals[:, 1, 1], pdf_vals[:, 1, 1], err=0.02)
-
-  def _assertIntegral(self, sample_vals, pdf_vals, err=1e-3):
-    s_p = zip(sample_vals, pdf_vals)
-    prev = (0, 0)
-    total = 0
-    for k in sorted(s_p, key=lambda x: x[0]):
-      pair_pdf = (k[1] + prev[1]) / 2
-      total += (k[0] - prev[0]) * pair_pdf
-      prev = k
-    self.assertNear(1., total, err=err)
-
-  def testLaplaceNonPositiveInitializationParamsRaises(self):
-    with self.test_session():
-      loc_v = constant_op.constant(0.0, name="loc")
-      scale_v = constant_op.constant(-1.0, name="scale")
-      laplace = laplace_lib.Laplace(
-          loc=loc_v, scale=scale_v, validate_args=True)
-      with self.assertRaisesOpError("scale"):
-        laplace.mean().eval()
-      loc_v = constant_op.constant(1.0, name="loc")
-      scale_v = constant_op.constant(0.0, name="scale")
-      laplace = laplace_lib.Laplace(
-          loc=loc_v, scale=scale_v, validate_args=True)
-      with self.assertRaisesOpError("scale"):
-        laplace.mean().eval()
-
-  def testLaplaceWithSoftplusScale(self):
-    with self.test_session():
-      loc_v = constant_op.constant([0.0, 1.0], name="loc")
-      scale_v = constant_op.constant([-1.0, 2.0], name="scale")
-      laplace = laplace_lib.LaplaceWithSoftplusScale(loc=loc_v, scale=scale_v)
-      self.assertAllClose(nn_ops.softplus(scale_v).eval(), laplace.scale.eval())
-      self.assertAllClose(loc_v.eval(), laplace.loc.eval())
-
-
-if __name__ == "__main__":
-  test.main()
diff --git a/tensorflow/contrib/distributions/python/kernel_tests/multinomial_test.py b/tensorflow/contrib/distributions/python/kernel_tests/multinomial_test.py
deleted file mode 100644
index b1c0c9f7a9..0000000000
--- a/tensorflow/contrib/distributions/python/kernel_tests/multinomial_test.py
+++ /dev/null
@@ -1,341 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-from tensorflow.contrib import distributions
-from tensorflow.python.framework import dtypes
-from tensorflow.python.framework import tensor_shape
-from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import math_ops
-from tensorflow.python.platform import test
-
-ds = distributions
-
-
-class MultinomialTest(test.TestCase):
-
-  def setUp(self):
-    self._rng = np.random.RandomState(42)
-
-  def testSimpleShapes(self):
-    with self.test_session():
-      p = [.1, .3, .6]
-      dist = ds.Multinomial(total_count=1., probs=p)
-      self.assertEqual(3, dist.event_shape_tensor().eval())
-      self.assertAllEqual([], dist.batch_shape_tensor().eval())
-      self.assertEqual(tensor_shape.TensorShape([3]), dist.event_shape)
-      self.assertEqual(tensor_shape.TensorShape([]), dist.batch_shape)
-
-  def testComplexShapes(self):
-    with self.test_session():
-      p = 0.5 * np.ones([3, 2, 2], dtype=np.float32)
-      n = [[3., 2], [4, 5], [6, 7]]
-      dist = ds.Multinomial(total_count=n, probs=p)
-      self.assertEqual(2, dist.event_shape_tensor().eval())
-      self.assertAllEqual([3, 2], dist.batch_shape_tensor().eval())
-      self.assertEqual(tensor_shape.TensorShape([2]), dist.event_shape)
-      self.assertEqual(tensor_shape.TensorShape([3, 2]), dist.batch_shape)
-
-  def testN(self):
-    p = [[0.1, 0.2, 0.7], [0.2, 0.3, 0.5]]
-    n = [[3.], [4]]
-    with self.test_session():
-      dist = ds.Multinomial(total_count=n, probs=p)
-      self.assertEqual((2, 1), dist.total_count.get_shape())
-      self.assertAllClose(n, dist.total_count.eval())
-
-  def testP(self):
-    p = [[0.1, 0.2, 0.7]]
-    with self.test_session():
-      dist = ds.Multinomial(total_count=3., probs=p)
-      self.assertEqual((1, 3), dist.probs.get_shape())
-      self.assertEqual((1, 3), dist.logits.get_shape())
-      self.assertAllClose(p, dist.probs.eval())
-
-  def testLogits(self):
-    p = np.array([[0.1, 0.2, 0.7]], dtype=np.float32)
-    logits = np.log(p) - 50.
-    with self.test_session():
-      multinom = ds.Multinomial(total_count=3., logits=logits)
-      self.assertEqual((1, 3), multinom.probs.get_shape())
-      self.assertEqual((1, 3), multinom.logits.get_shape())
-      self.assertAllClose(p, multinom.probs.eval())
-      self.assertAllClose(logits, multinom.logits.eval())
-
-  def testPmfandCountsAgree(self):
-    p = [[0.1, 0.2, 0.7]]
-    n = [[5.]]
-    with self.test_session():
-      dist = ds.Multinomial(total_count=n, probs=p, validate_args=True)
-      dist.prob([2., 3, 0]).eval()
-      dist.prob([3., 0, 2]).eval()
-      with self.assertRaisesOpError("must be non-negative"):
-        dist.prob([-1., 4, 2]).eval()
-      with self.assertRaisesOpError("counts must sum to `self.total_count`"):
-        dist.prob([3., 3, 0]).eval()
-
-  def testPmfNonIntegerCounts(self):
-    p = [[0.1, 0.2, 0.7]]
-    n = [[5.]]
-    with self.test_session():
-      # No errors with integer n.
-      multinom = ds.Multinomial(total_count=n, probs=p, validate_args=True)
-      multinom.prob([2., 1, 2]).eval()
-      multinom.prob([3., 0, 2]).eval()
-      # Counts don't sum to n.
-      with self.assertRaisesOpError("counts must sum to `self.total_count`"):
-        multinom.prob([2., 3, 2]).eval()
-      # Counts are non-integers.
-      x = array_ops.placeholder(dtypes.float32)
-      with self.assertRaisesOpError(
-          "cannot contain fractional components."):
-        multinom.prob(x).eval(feed_dict={x: [1.0, 2.5, 1.5]})
-
-      multinom = ds.Multinomial(total_count=n, probs=p, validate_args=False)
-      multinom.prob([1., 2., 2.]).eval()
-      # Non-integer arguments work.
-      multinom.prob([1.0, 2.5, 1.5]).eval()
-
-  def testPmfBothZeroBatches(self):
-    with self.test_session():
-      # Both zero-batches.  No broadcast
-      p = [0.5, 0.5]
-      counts = [1., 0]
-      pmf = ds.Multinomial(total_count=1., probs=p).prob(counts)
-      self.assertAllClose(0.5, pmf.eval())
-      self.assertEqual((), pmf.get_shape())
-
-  def testPmfBothZeroBatchesNontrivialN(self):
-    with self.test_session():
-      # Both zero-batches.  No broadcast
-      p = [0.1, 0.9]
-      counts = [3., 2]
-      dist = ds.Multinomial(total_count=5., probs=p)
-      pmf = dist.prob(counts)
-      # 5 choose 3 = 5 choose 2 = 10. 10 * (.9)^2 * (.1)^3 = 81/10000.
-      self.assertAllClose(81. / 10000, pmf.eval())
-      self.assertEqual((), pmf.get_shape())
-
-  def testPmfPStretchedInBroadcastWhenSameRank(self):
-    with self.test_session():
-      p = [[0.1, 0.9]]
-      counts = [[1., 0], [0, 1]]
-      pmf = ds.Multinomial(total_count=1., probs=p).prob(counts)
-      self.assertAllClose([0.1, 0.9], pmf.eval())
-      self.assertEqual((2), pmf.get_shape())
-
-  def testPmfPStretchedInBroadcastWhenLowerRank(self):
-    with self.test_session():
-      p = [0.1, 0.9]
-      counts = [[1., 0], [0, 1]]
-      pmf = ds.Multinomial(total_count=1., probs=p).prob(counts)
-      self.assertAllClose([0.1, 0.9], pmf.eval())
-      self.assertEqual((2), pmf.get_shape())
-
-  def testPmfCountsStretchedInBroadcastWhenSameRank(self):
-    with self.test_session():
-      p = [[0.1, 0.9], [0.7, 0.3]]
-      counts = [[1., 0]]
-      pmf = ds.Multinomial(total_count=1., probs=p).prob(counts)
-      self.assertAllClose(pmf.eval(), [0.1, 0.7])
-      self.assertEqual((2), pmf.get_shape())
-
-  def testPmfCountsStretchedInBroadcastWhenLowerRank(self):
-    with self.test_session():
-      p = [[0.1, 0.9], [0.7, 0.3]]
-      counts = [1., 0]
-      pmf = ds.Multinomial(total_count=1., probs=p).prob(counts)
-      self.assertAllClose(pmf.eval(), [0.1, 0.7])
-      self.assertEqual(pmf.get_shape(), (2))
-
-  def testPmfShapeCountsStretchedN(self):
-    with self.test_session():
-      # [2, 2, 2]
-      p = [[[0.1, 0.9], [0.1, 0.9]], [[0.7, 0.3], [0.7, 0.3]]]
-      # [2, 2]
-      n = [[3., 3], [3, 3]]
-      # [2]
-      counts = [2., 1]
-      pmf = ds.Multinomial(total_count=n, probs=p).prob(counts)
-      pmf.eval()
-      self.assertEqual(pmf.get_shape(), (2, 2))
-
-  def testPmfShapeCountsPStretchedN(self):
-    with self.test_session():
-      p = [0.1, 0.9]
-      counts = [3., 2]
-      n = np.full([4, 3], 5., dtype=np.float32)
-      pmf = ds.Multinomial(total_count=n, probs=p).prob(counts)
-      pmf.eval()
-      self.assertEqual((4, 3), pmf.get_shape())
-
-  def testMultinomialMean(self):
-    with self.test_session():
-      n = 5.
-      p = [0.1, 0.2, 0.7]
-      dist = ds.Multinomial(total_count=n, probs=p)
-      expected_means = 5 * np.array(p, dtype=np.float32)
-      self.assertEqual((3,), dist.mean().get_shape())
-      self.assertAllClose(expected_means, dist.mean().eval())
-
-  def testMultinomialCovariance(self):
-    with self.test_session():
-      n = 5.
-      p = [0.1, 0.2, 0.7]
-      dist = ds.Multinomial(total_count=n, probs=p)
-      expected_covariances = [[9. / 20, -1 / 10, -7 / 20],
-                              [-1 / 10, 4 / 5, -7 / 10],
-                              [-7 / 20, -7 / 10, 21 / 20]]
-      self.assertEqual((3, 3), dist.covariance().get_shape())
-      self.assertAllClose(expected_covariances, dist.covariance().eval())
-
-  def testMultinomialCovarianceBatch(self):
-    with self.test_session():
-      # Shape [2]
-      n = [5.] * 2
-      # Shape [4, 1, 2]
-      p = [[[0.1, 0.9]], [[0.1, 0.9]]] * 2
-      dist = ds.Multinomial(total_count=n, probs=p)
-      # Shape [2, 2]
-      inner_var = [[9. / 20, -9 / 20], [-9 / 20, 9 / 20]]
-      # Shape [4, 2, 2, 2]
-      expected_covariances = [[inner_var, inner_var]] * 4
-      self.assertEqual((4, 2, 2, 2), dist.covariance().get_shape())
-      self.assertAllClose(expected_covariances, dist.covariance().eval())
-
-  def testCovarianceMultidimensional(self):
-    # Shape [3, 5, 4]
-    p = np.random.dirichlet([.25, .25, .25, .25], [3, 5]).astype(np.float32)
-    # Shape [6, 3, 3]
-    p2 = np.random.dirichlet([.3, .3, .4], [6, 3]).astype(np.float32)
-
-    ns = np.random.randint(low=1, high=11, size=[3, 5]).astype(np.float32)
-    ns2 = np.random.randint(low=1, high=11, size=[6, 1]).astype(np.float32)
-
-    with self.test_session():
-      dist = ds.Multinomial(ns, p)
-      dist2 = ds.Multinomial(ns2, p2)
-
-      covariance = dist.covariance()
-      covariance2 = dist2.covariance()
-      self.assertEqual((3, 5, 4, 4), covariance.get_shape())
-      self.assertEqual((6, 3, 3, 3), covariance2.get_shape())
-
-  def testCovarianceFromSampling(self):
-    # We will test mean, cov, var, stddev on a DirichletMultinomial constructed
-    # via broadcast between alpha, n.
-    theta = np.array([[1., 2, 3],
-                      [2.5, 4, 0.01]], dtype=np.float32)
-    theta /= np.sum(theta, 1)[..., array_ops.newaxis]
-    # Ideally we'd be able to test broadcasting but, the multinomial sampler
-    # doesn't support different total counts.
-    n = np.float32(5)
-    with self.test_session() as sess:
-      dist = ds.Multinomial(n, theta)  # batch_shape=[2], event_shape=[3]
-      x = dist.sample(int(250e3), seed=1)
-      sample_mean = math_ops.reduce_mean(x, 0)
-      x_centered = x - sample_mean[array_ops.newaxis, ...]
-      sample_cov = math_ops.reduce_mean(math_ops.matmul(
-          x_centered[..., array_ops.newaxis],
-          x_centered[..., array_ops.newaxis, :]), 0)
-      sample_var = array_ops.matrix_diag_part(sample_cov)
-      sample_stddev = math_ops.sqrt(sample_var)
-      [
-          sample_mean_,
-          sample_cov_,
-          sample_var_,
-          sample_stddev_,
-          analytic_mean,
-          analytic_cov,
-          analytic_var,
-          analytic_stddev,
-      ] = sess.run([
-          sample_mean,
-          sample_cov,
-          sample_var,
-          sample_stddev,
-          dist.mean(),
-          dist.covariance(),
-          dist.variance(),
-          dist.stddev(),
-      ])
-      self.assertAllClose(sample_mean_, analytic_mean, atol=0., rtol=0.01)
-      self.assertAllClose(sample_cov_, analytic_cov, atol=0., rtol=0.01)
-      self.assertAllClose(sample_var_, analytic_var, atol=0., rtol=0.01)
-      self.assertAllClose(sample_stddev_, analytic_stddev, atol=0., rtol=0.01)
-
-  def testSampleUnbiasedNonScalarBatch(self):
-    with self.test_session() as sess:
-      dist = ds.Multinomial(
-          total_count=5.,
-          logits=math_ops.log(2. * self._rng.rand(4, 3, 2).astype(np.float32)))
-      n = int(3e3)
-      x = dist.sample(n, seed=0)
-      sample_mean = math_ops.reduce_mean(x, 0)
-      # Cyclically rotate event dims left.
-      x_centered = array_ops.transpose(x - sample_mean, [1, 2, 3, 0])
-      sample_covariance = math_ops.matmul(
-          x_centered, x_centered, adjoint_b=True) / n
-      [
-          sample_mean_,
-          sample_covariance_,
-          actual_mean_,
-          actual_covariance_,
-      ] = sess.run([
-          sample_mean,
-          sample_covariance,
-          dist.mean(),
-          dist.covariance(),
-      ])
-      self.assertAllEqual([4, 3, 2], sample_mean.get_shape())
-      self.assertAllClose(actual_mean_, sample_mean_, atol=0., rtol=0.07)
-      self.assertAllEqual([4, 3, 2, 2], sample_covariance.get_shape())
-      self.assertAllClose(
-          actual_covariance_, sample_covariance_, atol=0., rtol=0.10)
-
-  def testSampleUnbiasedScalarBatch(self):
-    with self.test_session() as sess:
-      dist = ds.Multinomial(
-          total_count=5.,
-          logits=math_ops.log(2. * self._rng.rand(4).astype(np.float32)))
-      n = int(5e3)
-      x = dist.sample(n, seed=0)
-      sample_mean = math_ops.reduce_mean(x, 0)
-      x_centered = x - sample_mean  # Already transposed to [n, 2].
-      sample_covariance = math_ops.matmul(
-          x_centered, x_centered, adjoint_a=True) / n
-      [
-          sample_mean_,
-          sample_covariance_,
-          actual_mean_,
-          actual_covariance_,
-      ] = sess.run([
-          sample_mean,
-          sample_covariance,
-          dist.mean(),
-          dist.covariance(),
-      ])
-      self.assertAllEqual([4], sample_mean.get_shape())
-      self.assertAllClose(actual_mean_, sample_mean_, atol=0., rtol=0.07)
-      self.assertAllEqual([4, 4], sample_covariance.get_shape())
-      self.assertAllClose(
-          actual_covariance_, sample_covariance_, atol=0., rtol=0.10)
-
-
-if __name__ == "__main__":
-  test.main()
diff --git a/tensorflow/contrib/distributions/python/kernel_tests/student_t_test.py b/tensorflow/contrib/distributions/python/kernel_tests/student_t_test.py
deleted file mode 100644
index 209ef696ca..0000000000
--- a/tensorflow/contrib/distributions/python/kernel_tests/student_t_test.py
+++ /dev/null
@@ -1,475 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for Student t distribution."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import math
-
-import numpy as np
-from scipy import stats
-from tensorflow.contrib import distributions
-from tensorflow.contrib.distributions.python.ops import student_t
-from tensorflow.python.framework import constant_op
-from tensorflow.python.framework import random_seed
-from tensorflow.python.ops import math_ops
-from tensorflow.python.ops import nn_ops
-from tensorflow.python.platform import test
-
-ds = distributions
-
-
-class StudentTTest(test.TestCase):
-
-  def testStudentPDFAndLogPDF(self):
-    with self.test_session():
-      batch_size = 6
-      df = constant_op.constant([3.] * batch_size)
-      mu = constant_op.constant([7.] * batch_size)
-      sigma = constant_op.constant([8.] * batch_size)
-      df_v = 3.
-      mu_v = 7.
-      sigma_v = 8.
-      t = np.array([-2.5, 2.5, 8., 0., -1., 2.], dtype=np.float32)
-      student = ds.StudentT(df, loc=mu, scale=-sigma)
-
-      log_pdf = student.log_prob(t)
-      self.assertEquals(log_pdf.get_shape(), (6,))
-      log_pdf_values = log_pdf.eval()
-      pdf = student.prob(t)
-      self.assertEquals(pdf.get_shape(), (6,))
-      pdf_values = pdf.eval()
-
-      expected_log_pdf = stats.t.logpdf(t, df_v, loc=mu_v, scale=sigma_v)
-      expected_pdf = stats.t.pdf(t, df_v, loc=mu_v, scale=sigma_v)
-      self.assertAllClose(expected_log_pdf, log_pdf_values)
-      self.assertAllClose(np.log(expected_pdf), log_pdf_values)
-      self.assertAllClose(expected_pdf, pdf_values)
-      self.assertAllClose(np.exp(expected_log_pdf), pdf_values)
-
-  def testStudentLogPDFMultidimensional(self):
-    with self.test_session():
-      batch_size = 6
-      df = constant_op.constant([[1.5, 7.2]] * batch_size)
-      mu = constant_op.constant([[3., -3.]] * batch_size)
-      sigma = constant_op.constant([[-math.sqrt(10.), math.sqrt(15.)]] *
-                                   batch_size)
-      df_v = np.array([1.5, 7.2])
-      mu_v = np.array([3., -3.])
-      sigma_v = np.array([np.sqrt(10.), np.sqrt(15.)])
-      t = np.array([[-2.5, 2.5, 4., 0., -1., 2.]], dtype=np.float32).T
-      student = ds.StudentT(df, loc=mu, scale=sigma)
-      log_pdf = student.log_prob(t)
-      log_pdf_values = log_pdf.eval()
-      self.assertEqual(log_pdf.get_shape(), (6, 2))
-      pdf = student.prob(t)
-      pdf_values = pdf.eval()
-      self.assertEqual(pdf.get_shape(), (6, 2))
-      expected_log_pdf = stats.t.logpdf(t, df_v, loc=mu_v, scale=sigma_v)
-      expected_pdf = stats.t.pdf(t, df_v, loc=mu_v, scale=sigma_v)
-      self.assertAllClose(expected_log_pdf, log_pdf_values)
-      self.assertAllClose(np.log(expected_pdf), log_pdf_values)
-      self.assertAllClose(expected_pdf, pdf_values)
-      self.assertAllClose(np.exp(expected_log_pdf), pdf_values)
-
-  def testStudentCDFAndLogCDF(self):
-    with self.test_session():
-      batch_size = 6
-      df = constant_op.constant([3.] * batch_size)
-      mu = constant_op.constant([7.] * batch_size)
-      sigma = constant_op.constant([-8.] * batch_size)
-      df_v = 3.
-      mu_v = 7.
-      sigma_v = 8.
-      t = np.array([-2.5, 2.5, 8., 0., -1., 2.], dtype=np.float32)
-      student = student_t.StudentT(df, loc=mu, scale=sigma)
-
-      log_cdf = student.log_cdf(t)
-      self.assertEquals(log_cdf.get_shape(), (6,))
-      log_cdf_values = log_cdf.eval()
-      cdf = student.cdf(t)
-      self.assertEquals(cdf.get_shape(), (6,))
-      cdf_values = cdf.eval()
-
-      expected_log_cdf = stats.t.logcdf(t, df_v, loc=mu_v, scale=sigma_v)
-      expected_cdf = stats.t.cdf(t, df_v, loc=mu_v, scale=sigma_v)
-      self.assertAllClose(expected_log_cdf, log_cdf_values, atol=0., rtol=1e-5)
-      self.assertAllClose(
-          np.log(expected_cdf), log_cdf_values, atol=0., rtol=1e-5)
-      self.assertAllClose(expected_cdf, cdf_values, atol=0., rtol=1e-5)
-      self.assertAllClose(
-          np.exp(expected_log_cdf), cdf_values, atol=0., rtol=1e-5)
-
-  def testStudentEntropy(self):
-    df_v = np.array([[2., 3., 7.]])  # 1x3
-    mu_v = np.array([[1., -1, 0]])  # 1x3
-    sigma_v = np.array([[1., -2., 3.]]).T  # transposed => 3x1
-    with self.test_session():
-      student = ds.StudentT(df=df_v, loc=mu_v, scale=sigma_v)
-      ent = student.entropy()
-      ent_values = ent.eval()
-
-    # Help scipy broadcast to 3x3
-    ones = np.array([[1, 1, 1]])
-    sigma_bc = np.abs(sigma_v) * ones
-    mu_bc = ones.T * mu_v
-    df_bc = ones.T * df_v
-    expected_entropy = stats.t.entropy(
-        np.reshape(df_bc, [-1]),
-        loc=np.reshape(mu_bc, [-1]),
-        scale=np.reshape(sigma_bc, [-1]))
-    expected_entropy = np.reshape(expected_entropy, df_bc.shape)
-    self.assertAllClose(expected_entropy, ent_values)
-
-  def testStudentSample(self):
-    with self.test_session():
-      df = constant_op.constant(4.)
-      mu = constant_op.constant(3.)
-      sigma = constant_op.constant(-math.sqrt(10.))
-      df_v = 4.
-      mu_v = 3.
-      sigma_v = np.sqrt(10.)
-      n = constant_op.constant(200000)
-      student = ds.StudentT(df=df, loc=mu, scale=sigma)
-      samples = student.sample(n, seed=123456)
-      sample_values = samples.eval()
-      n_val = 200000
-      self.assertEqual(sample_values.shape, (n_val,))
-      self.assertAllClose(sample_values.mean(), mu_v, rtol=1e-2, atol=0)
-      self.assertAllClose(
-          sample_values.var(),
-          sigma_v**2 * df_v / (df_v - 2),
-          rtol=1e-2,
-          atol=0)
-      self._checkKLApprox(df_v, mu_v, sigma_v, sample_values)
-
-  # Test that sampling with the same seed twice gives the same results.
-  def testStudentSampleMultipleTimes(self):
-    with self.test_session():
-      df = constant_op.constant(4.)
-      mu = constant_op.constant(3.)
-      sigma = constant_op.constant(math.sqrt(10.))
-      n = constant_op.constant(100)
-
-      random_seed.set_random_seed(654321)
-      student = ds.StudentT(df=df, loc=mu, scale=sigma, name="student_t1")
-      samples1 = student.sample(n, seed=123456).eval()
-
-      random_seed.set_random_seed(654321)
-      student2 = ds.StudentT(df=df, loc=mu, scale=sigma, name="student_t2")
-      samples2 = student2.sample(n, seed=123456).eval()
-
-      self.assertAllClose(samples1, samples2)
-
-  def testStudentSampleSmallDfNoNan(self):
-    with self.test_session():
-      df_v = [1e-1, 1e-5, 1e-10, 1e-20]
-      df = constant_op.constant(df_v)
-      n = constant_op.constant(200000)
-      student = ds.StudentT(df=df, loc=1., scale=1.)
-      samples = student.sample(n, seed=123456)
-      sample_values = samples.eval()
-      n_val = 200000
-      self.assertEqual(sample_values.shape, (n_val, 4))
-      self.assertTrue(np.all(np.logical_not(np.isnan(sample_values))))
-
-  def testStudentSampleMultiDimensional(self):
-    with self.test_session():
-      batch_size = 7
-      df = constant_op.constant([[3., 7.]] * batch_size)
-      mu = constant_op.constant([[3., -3.]] * batch_size)
-      sigma = constant_op.constant([[math.sqrt(10.), math.sqrt(15.)]] *
-                                   batch_size)
-      df_v = [3., 7.]
-      mu_v = [3., -3.]
-      sigma_v = [np.sqrt(10.), np.sqrt(15.)]
-      n = constant_op.constant(200000)
-      student = ds.StudentT(df=df, loc=mu, scale=sigma)
-      samples = student.sample(n, seed=123456)
-      sample_values = samples.eval()
-      self.assertEqual(samples.get_shape(), (200000, batch_size, 2))
-      self.assertAllClose(
-          sample_values[:, 0, 0].mean(), mu_v[0], rtol=1e-2, atol=0)
-      self.assertAllClose(
-          sample_values[:, 0, 0].var(),
-          sigma_v[0]**2 * df_v[0] / (df_v[0] - 2),
-          rtol=1e-1,
-          atol=0)
-      self._checkKLApprox(df_v[0], mu_v[0], sigma_v[0], sample_values[:, 0, 0])
-      self.assertAllClose(
-          sample_values[:, 0, 1].mean(), mu_v[1], rtol=1e-2, atol=0)
-      self.assertAllClose(
-          sample_values[:, 0, 1].var(),
-          sigma_v[1]**2 * df_v[1] / (df_v[1] - 2),
-          rtol=1e-1,
-          atol=0)
-      self._checkKLApprox(df_v[0], mu_v[0], sigma_v[0], sample_values[:, 0, 1])
-
-  def _checkKLApprox(self, df, mu, sigma, samples):
-    n = samples.size
-    np.random.seed(137)
-    sample_scipy = stats.t.rvs(df, loc=mu, scale=sigma, size=n)
-    covg = 0.99
-    r = stats.t.interval(covg, df, loc=mu, scale=sigma)
-    bins = 100
-    hist, _ = np.histogram(samples, bins=bins, range=r)
-    hist_scipy, _ = np.histogram(sample_scipy, bins=bins, range=r)
-    self.assertGreater(hist.sum(), n * (covg - .01))
-    self.assertGreater(hist_scipy.sum(), n * (covg - .01))
-    hist_min1 = hist + 1.  # put at least one item in each bucket
-    hist_norm = hist_min1 / hist_min1.sum()
-    hist_scipy_min1 = hist_scipy + 1.  # put at least one item in each bucket
-    hist_scipy_norm = hist_scipy_min1 / hist_scipy_min1.sum()
-    kl_appx = np.sum(np.log(hist_scipy_norm / hist_norm) * hist_scipy_norm)
-    self.assertLess(kl_appx, 1)
-
-  def testBroadcastingParams(self):
-
-    def _check(student):
-      self.assertEqual(student.mean().get_shape(), (3,))
-      self.assertEqual(student.variance().get_shape(), (3,))
-      self.assertEqual(student.entropy().get_shape(), (3,))
-      self.assertEqual(student.log_prob(2.).get_shape(), (3,))
-      self.assertEqual(student.prob(2.).get_shape(), (3,))
-      self.assertEqual(student.sample(37, seed=123456).get_shape(), (37, 3,))
-
-    _check(ds.StudentT(df=[2., 3., 4.,], loc=2., scale=1.))
-    _check(ds.StudentT(df=7., loc=[2., 3., 4.,], scale=1.))
-    _check(ds.StudentT(df=7., loc=3., scale=[2., 3., 4.,]))
-
-  def testBroadcastingPdfArgs(self):
-
-    def _assert_shape(student, arg, shape):
-      self.assertEqual(student.log_prob(arg).get_shape(), shape)
-      self.assertEqual(student.prob(arg).get_shape(), shape)
-
-    def _check(student):
-      _assert_shape(student, 2., (3,))
-      xs = np.array([2., 3., 4.], dtype=np.float32)
-      _assert_shape(student, xs, (3,))
-      xs = np.array([xs])
-      _assert_shape(student, xs, (1, 3))
-      xs = xs.T
-      _assert_shape(student, xs, (3, 3))
-
-    _check(ds.StudentT(df=[2., 3., 4.,], loc=2., scale=1.))
-    _check(ds.StudentT(df=7., loc=[2., 3., 4.,], scale=1.))
-    _check(ds.StudentT(df=7., loc=3., scale=[2., 3., 4.,]))
-
-    def _check2d(student):
-      _assert_shape(student, 2., (1, 3))
-      xs = np.array([2., 3., 4.], dtype=np.float32)
-      _assert_shape(student, xs, (1, 3))
-      xs = np.array([xs])
-      _assert_shape(student, xs, (1, 3))
-      xs = xs.T
-      _assert_shape(student, xs, (3, 3))
-
-    _check2d(ds.StudentT(df=[[2., 3., 4.,]], loc=2., scale=1.))
-    _check2d(ds.StudentT(df=7., loc=[[2., 3., 4.,]], scale=1.))
-    _check2d(ds.StudentT(df=7., loc=3., scale=[[2., 3., 4.,]]))
-
-    def _check2d_rows(student):
-      _assert_shape(student, 2., (3, 1))
-      xs = np.array([2., 3., 4.], dtype=np.float32)  # (3,)
-      _assert_shape(student, xs, (3, 3))
-      xs = np.array([xs])  # (1,3)
-      _assert_shape(student, xs, (3, 3))
-      xs = xs.T  # (3,1)
-      _assert_shape(student, xs, (3, 1))
-
-    _check2d_rows(ds.StudentT(df=[[2.], [3.], [4.]], loc=2., scale=1.))
-    _check2d_rows(ds.StudentT(df=7., loc=[[2.], [3.], [4.]], scale=1.))
-    _check2d_rows(ds.StudentT(df=7., loc=3., scale=[[2.], [3.], [4.]]))
-
-  def testMeanAllowNanStatsIsFalseWorksWhenAllBatchMembersAreDefined(self):
-    with self.test_session():
-      mu = [1., 3.3, 4.4]
-      student = ds.StudentT(df=[3., 5., 7.], loc=mu, scale=[3., 2., 1.])
-      mean = student.mean().eval()
-      self.assertAllClose([1., 3.3, 4.4], mean)
-
-  def testMeanAllowNanStatsIsFalseRaisesWhenBatchMemberIsUndefined(self):
-    with self.test_session():
-      mu = [1., 3.3, 4.4]
-      student = ds.StudentT(df=[0.5, 5., 7.], loc=mu, scale=[3., 2., 1.],
-                            allow_nan_stats=False)
-      with self.assertRaisesOpError("x < y"):
-        student.mean().eval()
-
-  def testMeanAllowNanStatsIsTrueReturnsNaNForUndefinedBatchMembers(self):
-    with self.test_session():
-      mu = [-2, 0., 1., 3.3, 4.4]
-      sigma = [5., 4., 3., 2., 1.]
-      student = ds.StudentT(df=[0.5, 1., 3., 5., 7.], loc=mu, scale=sigma,
-                            allow_nan_stats=True)
-      mean = student.mean().eval()
-      self.assertAllClose([np.nan, np.nan, 1., 3.3, 4.4], mean)
-
-  def testVarianceAllowNanStatsTrueReturnsNaNforUndefinedBatchMembers(self):
-    with self.test_session():
-      # df = 0.5 ==> undefined mean ==> undefined variance.
-      # df = 1.5 ==> infinite variance.
-      df = [0.5, 1.5, 3., 5., 7.]
-      mu = [-2, 0., 1., 3.3, 4.4]
-      sigma = [5., 4., 3., 2., 1.]
-      student = ds.StudentT(df=df, loc=mu, scale=sigma, allow_nan_stats=True)
-      var = student.variance().eval()
-      ## scipy uses inf for variance when the mean is undefined.  When mean is
-      # undefined we say variance is undefined as well.  So test the first
-      # member of var, making sure it is NaN, then replace with inf and compare
-      # to scipy.
-      self.assertTrue(np.isnan(var[0]))
-      var[0] = np.inf
-
-      expected_var = [
-          stats.t.var(d, loc=m, scale=s) for (d, m, s) in zip(df, mu, sigma)
-      ]
-      self.assertAllClose(expected_var, var)
-
-  def testVarianceAllowNanStatsFalseGivesCorrectValueForDefinedBatchMembers(
-      self):
-    with self.test_session():
-      # df = 1.5 ==> infinite variance.
-      df = [1.5, 3., 5., 7.]
-      mu = [0., 1., 3.3, 4.4]
-      sigma = [4., 3., 2., 1.]
-      student = ds.StudentT(df=df, loc=mu, scale=sigma)
-      var = student.variance().eval()
-
-      expected_var = [
-          stats.t.var(d, loc=m, scale=s) for (d, m, s) in zip(df, mu, sigma)
-      ]
-      self.assertAllClose(expected_var, var)
-
-  def testVarianceAllowNanStatsFalseRaisesForUndefinedBatchMembers(self):
-    with self.test_session():
-      # df <= 1 ==> variance not defined
-      student = ds.StudentT(df=1., loc=0., scale=1., allow_nan_stats=False)
-      with self.assertRaisesOpError("x < y"):
-        student.variance().eval()
-
-    with self.test_session():
-      # df <= 1 ==> variance not defined
-      student = ds.StudentT(df=0.5, loc=0., scale=1., allow_nan_stats=False)
-      with self.assertRaisesOpError("x < y"):
-        student.variance().eval()
-
-  def testStd(self):
-    with self.test_session():
-      # Defined for all batch members.
-      df = [3.5, 5., 3., 5., 7.]
-      mu = [-2.2]
-      sigma = [5., 4., 3., 2., 1.]
-      student = ds.StudentT(df=df, loc=mu, scale=sigma)
-      # Test broadcast of mu across shape of df/sigma
-      stddev = student.stddev().eval()
-      mu *= len(df)
-
-      expected_stddev = [
-          stats.t.std(d, loc=m, scale=s) for (d, m, s) in zip(df, mu, sigma)
-      ]
-      self.assertAllClose(expected_stddev, stddev)
-
-  def testMode(self):
-    with self.test_session():
-      df = [0.5, 1., 3]
-      mu = [-1, 0., 1]
-      sigma = [5., 4., 3.]
-      student = ds.StudentT(df=df, loc=mu, scale=sigma)
-      # Test broadcast of mu across shape of df/sigma
-      mode = student.mode().eval()
-      self.assertAllClose([-1., 0, 1], mode)
-
-  def testPdfOfSample(self):
-    with self.test_session() as sess:
-      student = ds.StudentT(df=3., loc=np.pi, scale=1.)
-      num = 20000
-      samples = student.sample(num, seed=123456)
-      pdfs = student.prob(samples)
-      mean = student.mean()
-      mean_pdf = student.prob(student.mean())
-      sample_vals, pdf_vals, mean_val, mean_pdf_val = sess.run(
-          [samples, pdfs, student.mean(), mean_pdf])
-      self.assertEqual(samples.get_shape(), (num,))
-      self.assertEqual(pdfs.get_shape(), (num,))
-      self.assertEqual(mean.get_shape(), ())
-      self.assertNear(np.pi, np.mean(sample_vals), err=0.02)
-      self.assertNear(np.pi, mean_val, err=1e-6)
-      self.assertNear(stats.t.pdf(np.pi, 3., loc=np.pi), mean_pdf_val, err=1e-6)
-      # Verify integral over sample*pdf ~= 1.
-      self._assertIntegral(sample_vals, pdf_vals, err=2e-3)
-
-  def testPdfOfSampleMultiDims(self):
-    with self.test_session() as sess:
-      student = ds.StudentT(df=[7., 11.], loc=[[5.], [6.]], scale=3.)
-      self.assertAllEqual([], student.event_shape)
-      self.assertAllEqual([], student.event_shape_tensor().eval())
-      self.assertAllEqual([2, 2], student.batch_shape)
-      self.assertAllEqual([2, 2], student.batch_shape_tensor().eval())
-      num = 50000
-      samples = student.sample(num, seed=123456)
-      pdfs = student.prob(samples)
-      sample_vals, pdf_vals = sess.run([samples, pdfs])
-      self.assertEqual(samples.get_shape(), (num, 2, 2))
-      self.assertEqual(pdfs.get_shape(), (num, 2, 2))
-      self.assertNear(5., np.mean(sample_vals[:, 0, :]), err=.03)
-      self.assertNear(6., np.mean(sample_vals[:, 1, :]), err=.03)
-      self.assertNear(
-          stats.t.var(7., loc=0., scale=3.),  # loc d.n. effect var
-          np.var(sample_vals[:, :, 0]),
-          err=.4)
-      self.assertNear(
-          stats.t.var(11., loc=0., scale=3.),  # loc d.n. effect var
-          np.var(sample_vals[:, :, 1]),
-          err=.4)
-      self._assertIntegral(sample_vals[:, 0, 0], pdf_vals[:, 0, 0], err=0.02)
-      self._assertIntegral(sample_vals[:, 0, 1], pdf_vals[:, 0, 1], err=0.02)
-      self._assertIntegral(sample_vals[:, 1, 0], pdf_vals[:, 1, 0], err=0.02)
-      self._assertIntegral(sample_vals[:, 1, 1], pdf_vals[:, 1, 1], err=0.02)
-
-  def _assertIntegral(self, sample_vals, pdf_vals, err=1.5e-3):
-    s_p = zip(sample_vals, pdf_vals)
-    prev = (sample_vals.min() - 1000, 0)
-    total = 0
-    for k in sorted(s_p, key=lambda x: x[0]):
-      pair_pdf = (k[1] + prev[1]) / 2
-      total += (k[0] - prev[0]) * pair_pdf
-      prev = k
-    self.assertNear(1., total, err=err)
-
-  def testNegativeDofFails(self):
-    with self.test_session():
-      student = ds.StudentT(df=[2, -5.], loc=0., scale=1.,
-                            validate_args=True, name="S")
-      with self.assertRaisesOpError(r"Condition x > 0 did not hold"):
-        student.mean().eval()
-
-  def testStudentTWithAbsDfSoftplusScale(self):
-    with self.test_session():
-      df = constant_op.constant([-3.2, -4.6])
-      mu = constant_op.constant([-4.2, 3.4])
-      sigma = constant_op.constant([-6.4, -8.8])
-      student = ds.StudentTWithAbsDfSoftplusScale(df=df, loc=mu, scale=sigma)
-      self.assertAllClose(
-          math_ops.floor(math_ops.abs(df)).eval(), student.df.eval())
-      self.assertAllClose(mu.eval(), student.loc.eval())
-      self.assertAllClose(nn_ops.softplus(sigma).eval(), student.scale.eval())
-
-
-if __name__ == "__main__":
-  test.main()
diff --git a/tensorflow/contrib/distributions/python/kernel_tests/uniform_test.py b/tensorflow/contrib/distributions/python/kernel_tests/uniform_test.py
deleted file mode 100644
index c3c97b98f0..0000000000
--- a/tensorflow/contrib/distributions/python/kernel_tests/uniform_test.py
+++ /dev/null
@@ -1,265 +0,0 @@
-# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for Uniform distribution."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-from scipy import stats
-from tensorflow.contrib.distributions.python.ops import uniform as uniform_lib
-from tensorflow.python.framework import constant_op
-from tensorflow.python.framework import errors_impl
-from tensorflow.python.framework import tensor_shape
-from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import math_ops
-from tensorflow.python.platform import test
-
-
-class UniformTest(test.TestCase):
-
-  def testUniformRange(self):
-    with self.test_session():
-      a = 3.0
-      b = 10.0
-      uniform = uniform_lib.Uniform(low=a, high=b)
-      self.assertAllClose(a, uniform.low.eval())
-      self.assertAllClose(b, uniform.high.eval())
-      self.assertAllClose(b - a, uniform.range().eval())
-
-  def testUniformPDF(self):
-    with self.test_session():
-      a = constant_op.constant([-3.0] * 5 + [15.0])
-      b = constant_op.constant([11.0] * 5 + [20.0])
-      uniform = uniform_lib.Uniform(low=a, high=b)
-
-      a_v = -3.0
-      b_v = 11.0
-      x = np.array([-10.5, 4.0, 0.0, 10.99, 11.3, 17.0], dtype=np.float32)
-
-      def _expected_pdf():
-        pdf = np.zeros_like(x) + 1.0 / (b_v - a_v)
-        pdf[x > b_v] = 0.0
-        pdf[x < a_v] = 0.0
-        pdf[5] = 1.0 / (20.0 - 15.0)
-        return pdf
-
-      expected_pdf = _expected_pdf()
-
-      pdf = uniform.prob(x)
-      self.assertAllClose(expected_pdf, pdf.eval())
-
-      log_pdf = uniform.log_prob(x)
-      self.assertAllClose(np.log(expected_pdf), log_pdf.eval())
-
-  def testUniformShape(self):
-    with self.test_session():
-      a = constant_op.constant([-3.0] * 5)
-      b = constant_op.constant(11.0)
-      uniform = uniform_lib.Uniform(low=a, high=b)
-
-      self.assertEqual(uniform.batch_shape_tensor().eval(), (5,))
-      self.assertEqual(uniform.batch_shape, tensor_shape.TensorShape([5]))
-      self.assertAllEqual(uniform.event_shape_tensor().eval(), [])
-      self.assertEqual(uniform.event_shape, tensor_shape.TensorShape([]))
-
-  def testUniformPDFWithScalarEndpoint(self):
-    with self.test_session():
-      a = constant_op.constant([0.0, 5.0])
-      b = constant_op.constant(10.0)
-      uniform = uniform_lib.Uniform(low=a, high=b)
-
-      x = np.array([0.0, 8.0], dtype=np.float32)
-      expected_pdf = np.array([1.0 / (10.0 - 0.0), 1.0 / (10.0 - 5.0)])
-
-      pdf = uniform.prob(x)
-      self.assertAllClose(expected_pdf, pdf.eval())
-
-  def testUniformCDF(self):
-    with self.test_session():
-      batch_size = 6
-      a = constant_op.constant([1.0] * batch_size)
-      b = constant_op.constant([11.0] * batch_size)
-      a_v = 1.0
-      b_v = 11.0
-      x = np.array([-2.5, 2.5, 4.0, 0.0, 10.99, 12.0], dtype=np.float32)
-
-      uniform = uniform_lib.Uniform(low=a, high=b)
-
-      def _expected_cdf():
-        cdf = (x - a_v) / (b_v - a_v)
-        cdf[x >= b_v] = 1
-        cdf[x < a_v] = 0
-        return cdf
-
-      cdf = uniform.cdf(x)
-      self.assertAllClose(_expected_cdf(), cdf.eval())
-
-      log_cdf = uniform.log_cdf(x)
-      self.assertAllClose(np.log(_expected_cdf()), log_cdf.eval())
-
-  def testUniformEntropy(self):
-    with self.test_session():
-      a_v = np.array([1.0, 1.0, 1.0])
-      b_v = np.array([[1.5, 2.0, 3.0]])
-      uniform = uniform_lib.Uniform(low=a_v, high=b_v)
-
-      expected_entropy = np.log(b_v - a_v)
-      self.assertAllClose(expected_entropy, uniform.entropy().eval())
-
-  def testUniformAssertMaxGtMin(self):
-    with self.test_session():
-      a_v = np.array([1.0, 1.0, 1.0], dtype=np.float32)
-      b_v = np.array([1.0, 2.0, 3.0], dtype=np.float32)
-      uniform = uniform_lib.Uniform(low=a_v, high=b_v, validate_args=True)
-
-      with self.assertRaisesWithPredicateMatch(errors_impl.InvalidArgumentError,
-                                               "x < y"):
-        uniform.low.eval()
-
-  def testUniformSample(self):
-    with self.test_session():
-      a = constant_op.constant([3.0, 4.0])
-      b = constant_op.constant(13.0)
-      a1_v = 3.0
-      a2_v = 4.0
-      b_v = 13.0
-      n = constant_op.constant(100000)
-      uniform = uniform_lib.Uniform(low=a, high=b)
-
-      samples = uniform.sample(n, seed=137)
-      sample_values = samples.eval()
-      self.assertEqual(sample_values.shape, (100000, 2))
-      self.assertAllClose(
-          sample_values[::, 0].mean(), (b_v + a1_v) / 2, atol=1e-2)
-      self.assertAllClose(
-          sample_values[::, 1].mean(), (b_v + a2_v) / 2, atol=1e-2)
-      self.assertFalse(
-          np.any(sample_values[::, 0] < a1_v) or np.any(sample_values >= b_v))
-      self.assertFalse(
-          np.any(sample_values[::, 1] < a2_v) or np.any(sample_values >= b_v))
-
-  def _testUniformSampleMultiDimensional(self):
-    # DISABLED: Please enable this test once b/issues/30149644 is resolved.
-    with self.test_session():
-      batch_size = 2
-      a_v = [3.0, 22.0]
-      b_v = [13.0, 35.0]
-      a = constant_op.constant([a_v] * batch_size)
-      b = constant_op.constant([b_v] * batch_size)
-
-      uniform = uniform_lib.Uniform(low=a, high=b)
-
-      n_v = 100000
-      n = constant_op.constant(n_v)
-      samples = uniform.sample(n)
-      self.assertEqual(samples.get_shape(), (n_v, batch_size, 2))
-
-      sample_values = samples.eval()
-
-      self.assertFalse(
-          np.any(sample_values[:, 0, 0] < a_v[0]) or
-          np.any(sample_values[:, 0, 0] >= b_v[0]))
-      self.assertFalse(
-          np.any(sample_values[:, 0, 1] < a_v[1]) or
-          np.any(sample_values[:, 0, 1] >= b_v[1]))
-
-      self.assertAllClose(
-          sample_values[:, 0, 0].mean(), (a_v[0] + b_v[0]) / 2, atol=1e-2)
-      self.assertAllClose(
-          sample_values[:, 0, 1].mean(), (a_v[1] + b_v[1]) / 2, atol=1e-2)
-
-  def testUniformMean(self):
-    with self.test_session():
-      a = 10.0
-      b = 100.0
-      uniform = uniform_lib.Uniform(low=a, high=b)
-      s_uniform = stats.uniform(loc=a, scale=b - a)
-      self.assertAllClose(uniform.mean().eval(), s_uniform.mean())
-
-  def testUniformVariance(self):
-    with self.test_session():
-      a = 10.0
-      b = 100.0
-      uniform = uniform_lib.Uniform(low=a, high=b)
-      s_uniform = stats.uniform(loc=a, scale=b - a)
-      self.assertAllClose(uniform.variance().eval(), s_uniform.var())
-
-  def testUniformStd(self):
-    with self.test_session():
-      a = 10.0
-      b = 100.0
-      uniform = uniform_lib.Uniform(low=a, high=b)
-      s_uniform = stats.uniform(loc=a, scale=b - a)
-      self.assertAllClose(uniform.stddev().eval(), s_uniform.std())
-
-  def testUniformNans(self):
-    with self.test_session():
-      a = 10.0
-      b = [11.0, 100.0]
-      uniform = uniform_lib.Uniform(low=a, high=b)
-
-      no_nans = constant_op.constant(1.0)
-      nans = constant_op.constant(0.0) / constant_op.constant(0.0)
-      self.assertTrue(math_ops.is_nan(nans).eval())
-      with_nans = array_ops.stack([no_nans, nans])
-
-      pdf = uniform.prob(with_nans)
-
-      is_nan = math_ops.is_nan(pdf).eval()
-      self.assertFalse(is_nan[0])
-      self.assertTrue(is_nan[1])
-
-  def testUniformSamplePdf(self):
-    with self.test_session():
-      a = 10.0
-      b = [11.0, 100.0]
-      uniform = uniform_lib.Uniform(a, b)
-      self.assertTrue(
-          math_ops.reduce_all(uniform.prob(uniform.sample(10)) > 0).eval())
-
-  def testUniformBroadcasting(self):
-    with self.test_session():
-      a = 10.0
-      b = [11.0, 20.0]
-      uniform = uniform_lib.Uniform(a, b)
-
-      pdf = uniform.prob([[10.5, 11.5], [9.0, 19.0], [10.5, 21.0]])
-      expected_pdf = np.array([[1.0, 0.1], [0.0, 0.1], [1.0, 0.0]])
-      self.assertAllClose(expected_pdf, pdf.eval())
-
-  def testUniformSampleWithShape(self):
-    with self.test_session():
-      a = 10.0
-      b = [11.0, 20.0]
-      uniform = uniform_lib.Uniform(a, b)
-
-      pdf = uniform.prob(uniform.sample((2, 3)))
-      # pylint: disable=bad-continuation
-      expected_pdf = [
-          [[1.0, 0.1], [1.0, 0.1], [1.0, 0.1]],
-          [[1.0, 0.1], [1.0, 0.1], [1.0, 0.1]],
-      ]
-      # pylint: enable=bad-continuation
-      self.assertAllClose(expected_pdf, pdf.eval())
-
-      pdf = uniform.prob(uniform.sample())
-      expected_pdf = [1.0, 0.1]
-      self.assertAllClose(expected_pdf, pdf.eval())
-
-
-if __name__ == "__main__":
-  test.main()
diff --git a/tensorflow/contrib/distributions/python/ops/bernoulli.py b/tensorflow/contrib/distributions/python/ops/bernoulli.py
deleted file mode 100644
index 3281b57e83..0000000000
--- a/tensorflow/contrib/distributions/python/ops/bernoulli.py
+++ /dev/null
@@ -1,215 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""The Bernoulli distribution class."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from tensorflow.python.framework import dtypes
-from tensorflow.python.framework import ops
-from tensorflow.python.framework import tensor_shape
-from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import check_ops
-from tensorflow.python.ops import control_flow_ops
-from tensorflow.python.ops import math_ops
-from tensorflow.python.ops import nn
-from tensorflow.python.ops import random_ops
-from tensorflow.python.ops.distributions import distribution
-from tensorflow.python.ops.distributions import kullback_leibler
-from tensorflow.python.ops.distributions import util as distribution_util
-
-
-class Bernoulli(distribution.Distribution):
-  """Bernoulli distribution.
-
-  The Bernoulli distribution with `probs` parameter, i.e., the probability of a
-  `1` outcome (vs a `0` outcome).
-  """
-
-  def __init__(self,
-               logits=None,
-               probs=None,
-               dtype=dtypes.int32,
-               validate_args=False,
-               allow_nan_stats=True,
-               name="Bernoulli"):
-    """Construct Bernoulli distributions.
-
-    Args:
-      logits: An N-D `Tensor` representing the log-odds of a `1` event. Each
-        entry in the `Tensor` parametrizes an independent Bernoulli distribution
-        where the probability of an event is sigmoid(logits). Only one of
-        `logits` or `probs` should be passed in.
-      probs: An N-D `Tensor` representing the probability of a `1`
-        event. Each entry in the `Tensor` parameterizes an independent
-        Bernoulli distribution. Only one of `logits` or `probs` should be passed
-        in.
-      dtype: The type of the event samples. Default: `int32`.
-      validate_args: Python `bool`, default `False`. When `True` distribution
-        parameters are checked for validity despite possibly degrading runtime
-        performance. When `False` invalid inputs may silently render incorrect
-        outputs.
-      allow_nan_stats: Python `bool`, default `True`. When `True`,
-        statistics (e.g., mean, mode, variance) use the value "`NaN`" to
-        indicate the result is undefined. When `False`, an exception is raised
-        if one or more of the statistic's batch members are undefined.
-      name: Python `str` name prefixed to Ops created by this class.
-
-    Raises:
-      ValueError: If p and logits are passed, or if neither are passed.
-    """
-    parameters = locals()
-    with ops.name_scope(name):
-      self._logits, self._probs = distribution_util.get_logits_and_probs(
-          logits=logits,
-          probs=probs,
-          validate_args=validate_args,
-          name=name)
-    super(Bernoulli, self).__init__(
-        dtype=dtype,
-        reparameterization_type=distribution.NOT_REPARAMETERIZED,
-        validate_args=validate_args,
-        allow_nan_stats=allow_nan_stats,
-        parameters=parameters,
-        graph_parents=[self._logits, self._probs],
-        name=name)
-
-  @staticmethod
-  def _param_shapes(sample_shape):
-    return {"logits": ops.convert_to_tensor(sample_shape, dtype=dtypes.int32)}
-
-  @property
-  def logits(self):
-    """Log-odds of a `1` outcome (vs `0`)."""
-    return self._logits
-
-  @property
-  def probs(self):
-    """Probability of a `1` outcome (vs `0`)."""
-    return self._probs
-
-  def _batch_shape_tensor(self):
-    return array_ops.shape(self._logits)
-
-  def _batch_shape(self):
-    return self._logits.get_shape()
-
-  def _event_shape_tensor(self):
-    return array_ops.constant([], dtype=dtypes.int32)
-
-  def _event_shape(self):
-    return tensor_shape.scalar()
-
-  def _sample_n(self, n, seed=None):
-    new_shape = array_ops.concat([[n], self.batch_shape_tensor()], 0)
-    uniform = random_ops.random_uniform(
-        new_shape, seed=seed, dtype=self.probs.dtype)
-    sample = math_ops.less(uniform, self.probs)
-    return math_ops.cast(sample, self.dtype)
-
-  def _log_prob(self, event):
-    event = self._maybe_assert_valid_sample(event)
-    # TODO(jaana): The current sigmoid_cross_entropy_with_logits has
-    # inconsistent  behavior for logits = inf/-inf.
-    event = math_ops.cast(event, self.logits.dtype)
-    logits = self.logits
-    # sigmoid_cross_entropy_with_logits doesn't broadcast shape,
-    # so we do this here.
-
-    def _broadcast(logits, event):
-      return (array_ops.ones_like(event) * logits,
-              array_ops.ones_like(logits) * event)
-
-    # First check static shape.
-    if (event.get_shape().is_fully_defined() and
-        logits.get_shape().is_fully_defined()):
-      if event.get_shape() != logits.get_shape():
-        logits, event = _broadcast(logits, event)
-    else:
-      logits, event = control_flow_ops.cond(
-          distribution_util.same_dynamic_shape(logits, event),
-          lambda: (logits, event),
-          lambda: _broadcast(logits, event))
-    return -nn.sigmoid_cross_entropy_with_logits(labels=event, logits=logits)
-
-  def _prob(self, event):
-    return math_ops.exp(self._log_prob(event))
-
-  def _entropy(self):
-    return (-self.logits * (math_ops.sigmoid(self.logits) - 1) +
-            nn.softplus(-self.logits))
-
-  def _mean(self):
-    return array_ops.identity(self.probs)
-
-  def _variance(self):
-    return self._mean() * (1. - self.probs)
-
-  def _mode(self):
-    """Returns `1` if `prob > 0.5` and `0` otherwise."""
-    return math_ops.cast(self.probs > 0.5, self.dtype)
-
-  def _maybe_assert_valid_sample(self, event, check_integer=True):
-    if not self.validate_args:
-      return event
-    event = distribution_util.embed_check_nonnegative_discrete(
-        event, check_integer=check_integer)
-    return control_flow_ops.with_dependencies([
-        check_ops.assert_less_equal(
-            event, array_ops.ones_like(event),
-            message="event is not less than or equal to 1."),
-    ], event)
-
-
-class BernoulliWithSigmoidProbs(Bernoulli):
-  """Bernoulli with `probs = nn.sigmoid(logits)`."""
-
-  def __init__(self,
-               logits=None,
-               dtype=dtypes.int32,
-               validate_args=False,
-               allow_nan_stats=True,
-               name="BernoulliWithSigmoidProbs"):
-    parameters = locals()
-    with ops.name_scope(name):
-      super(BernoulliWithSigmoidProbs, self).__init__(
-          probs=nn.sigmoid(logits, name="sigmoid_probs"),
-          dtype=dtype,
-          validate_args=validate_args,
-          allow_nan_stats=allow_nan_stats,
-          name=name)
-    self._parameters = parameters
-
-
-@kullback_leibler.RegisterKL(Bernoulli, Bernoulli)
-def _kl_bernoulli_bernoulli(a, b, name=None):
-  """Calculate the batched KL divergence KL(a || b) with a and b Bernoulli.
-
-  Args:
-    a: instance of a Bernoulli distribution object.
-    b: instance of a Bernoulli distribution object.
-    name: (optional) Name to use for created operations.
-      default is "kl_bernoulli_bernoulli".
-
-  Returns:
-    Batchwise KL(a || b)
-  """
-  with ops.name_scope(name, "kl_bernoulli_bernoulli",
-                      values=[a.logits, b.logits]):
-    delta_probs0 = nn.softplus(-b.logits) - nn.softplus(-a.logits)
-    delta_probs1 = nn.softplus(b.logits) - nn.softplus(a.logits)
-    return (math_ops.sigmoid(a.logits) * delta_probs0
-            + math_ops.sigmoid(-a.logits) * delta_probs1)
diff --git a/tensorflow/contrib/distributions/python/ops/beta.py b/tensorflow/contrib/distributions/python/ops/beta.py
deleted file mode 100644
index 2b93478cdf..0000000000
--- a/tensorflow/contrib/distributions/python/ops/beta.py
+++ /dev/null
@@ -1,366 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""The Beta distribution class."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-
-from tensorflow.python.framework import constant_op
-from tensorflow.python.framework import dtypes
-from tensorflow.python.framework import ops
-from tensorflow.python.framework import tensor_shape
-from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import check_ops
-from tensorflow.python.ops import control_flow_ops
-from tensorflow.python.ops import math_ops
-from tensorflow.python.ops import nn
-from tensorflow.python.ops import random_ops
-from tensorflow.python.ops.distributions import distribution
-from tensorflow.python.ops.distributions import kullback_leibler
-from tensorflow.python.ops.distributions import util as distribution_util
-
-
-__all__ = [
-    "Beta",
-    "BetaWithSoftplusConcentration",
-]
-
-
-_beta_sample_note = """Note: `x` must have dtype `self.dtype` and be in
-`[0, 1].` It must have a shape compatible with `self.batch_shape()`."""
-
-
-class Beta(distribution.Distribution):
-  """Beta distribution.
-
-  The Beta distribution is defined over the `(0, 1)` interval using parameters
-  `concentration1` (aka "alpha") and `concentration0` (aka "beta").
-
-  #### Mathematical Details
-
-  The probability density function (pdf) is,
-
-  ```none
-  pdf(x; alpha, beta) = x**(alpha - 1) (1 - x)**(beta - 1) / Z
-  Z = Gamma(alpha) Gamma(beta) / Gamma(alpha + beta)
-  ```
-
-  where:
-
-  * `concentration1 = alpha`,
-  * `concentration0 = beta`,
-  * `Z` is the normalization constant, and,
-  * `Gamma` is the [gamma function](
-    https://en.wikipedia.org/wiki/Gamma_function).
-
-  The concentration parameters represent mean total counts of a `1` or a `0`,
-  i.e.,
-
-  ```none
-  concentration1 = alpha = mean * total_concentration
-  concentration0 = beta  = (1. - mean) * total_concentration
-  ```
-
-  where `mean` in `(0, 1)` and `total_concentration` is a positive real number
-  representing a mean `total_count = concentration1 + concentration0`.
-
-  Distribution parameters are automatically broadcast in all functions; see
-  examples for details.
-
-  #### Examples
-
-  ```python
-  # Create a batch of three Beta distributions.
-  alpha = [1, 2, 3]
-  beta = [1, 2, 3]
-  dist = Beta(alpha, beta)
-
-  dist.sample([4, 5])  # Shape [4, 5, 3]
-
-  # `x` has three batch entries, each with two samples.
-  x = [[.1, .4, .5],
-       [.2, .3, .5]]
-  # Calculate the probability of each pair of samples under the corresponding
-  # distribution in `dist`.
-  dist.prob(x)         # Shape [2, 3]
-  ```
-
-  ```python
-  # Create batch_shape=[2, 3] via parameter broadcast:
-  alpha = [[1.], [2]]      # Shape [2, 1]
-  beta = [3., 4, 5]        # Shape [3]
-  dist = Beta(alpha, beta)
-
-  # alpha broadcast as: [[1., 1, 1,],
-  #                      [2, 2, 2]]
-  # beta broadcast as:  [[3., 4, 5],
-  #                      [3, 4, 5]]
-  # batch_Shape [2, 3]
-  dist.sample([4, 5])  # Shape [4, 5, 2, 3]
-
-  x = [.2, .3, .5]
-  # x will be broadcast as [[.2, .3, .5],
-  #                         [.2, .3, .5]],
-  # thus matching batch_shape [2, 3].
-  dist.prob(x)         # Shape [2, 3]
-  ```
-
-  """
-
-  def __init__(self,
-               concentration1=None,
-               concentration0=None,
-               validate_args=False,
-               allow_nan_stats=True,
-               name="Beta"):
-    """Initialize a batch of Beta distributions.
-
-    Args:
-      concentration1: Positive floating-point `Tensor` indicating mean
-        number of successes; aka "alpha". Implies `self.dtype` and
-        `self.batch_shape`, i.e.,
-        `concentration1.shape = [N1, N2, ..., Nm] = self.batch_shape`.
-      concentration0: Positive floating-point `Tensor` indicating mean
-        number of failures; aka "beta". Otherwise has same semantics as
-        `concentration1`.
-      validate_args: Python `bool`, default `False`. When `True` distribution
-        parameters are checked for validity despite possibly degrading runtime
-        performance. When `False` invalid inputs may silently render incorrect
-        outputs.
-      allow_nan_stats: Python `bool`, default `True`. When `True`, statistics
-        (e.g., mean, mode, variance) use the value "`NaN`" to indicate the
-        result is undefined. When `False`, an exception is raised if one or
-        more of the statistic's batch members are undefined.
-      name: Python `str` name prefixed to Ops created by this class.
-    """
-    parameters = locals()
-    with ops.name_scope(name, values=[concentration1, concentration0]):
-      self._concentration1 = self._maybe_assert_valid_concentration(
-          ops.convert_to_tensor(concentration1, name="concentration1"),
-          validate_args)
-      self._concentration0 = self._maybe_assert_valid_concentration(
-          ops.convert_to_tensor(concentration0, name="concentration0"),
-          validate_args)
-      check_ops.assert_same_float_dtype([
-          self._concentration1, self._concentration0])
-      self._total_concentration = self._concentration1 + self._concentration0
-    super(Beta, self).__init__(
-        dtype=self._total_concentration.dtype,
-        validate_args=validate_args,
-        allow_nan_stats=allow_nan_stats,
-        reparameterization_type=distribution.NOT_REPARAMETERIZED,
-        parameters=parameters,
-        graph_parents=[self._concentration1,
-                       self._concentration0,
-                       self._total_concentration],
-        name=name)
-
-  @staticmethod
-  def _param_shapes(sample_shape):
-    return dict(zip(
-        ["concentration1", "concentration0"],
-        [ops.convert_to_tensor(sample_shape, dtype=dtypes.int32)] * 2))
-
-  @property
-  def concentration1(self):
-    """Concentration parameter associated with a `1` outcome."""
-    return self._concentration1
-
-  @property
-  def concentration0(self):
-    """Concentration parameter associated with a `0` outcome."""
-    return self._concentration0
-
-  @property
-  def total_concentration(self):
-    """Sum of concentration parameters."""
-    return self._total_concentration
-
-  def _batch_shape_tensor(self):
-    return array_ops.shape(self.total_concentration)
-
-  def _batch_shape(self):
-    return self.total_concentration.get_shape()
-
-  def _event_shape_tensor(self):
-    return constant_op.constant([], dtype=dtypes.int32)
-
-  def _event_shape(self):
-    return tensor_shape.scalar()
-
-  def _sample_n(self, n, seed=None):
-    expanded_concentration1 = array_ops.ones_like(
-        self.total_concentration, dtype=self.dtype) * self.concentration1
-    expanded_concentration0 = array_ops.ones_like(
-        self.total_concentration, dtype=self.dtype) * self.concentration0
-    gamma1_sample = random_ops.random_gamma(
-        shape=[n],
-        alpha=expanded_concentration1,
-        dtype=self.dtype,
-        seed=seed)
-    gamma2_sample = random_ops.random_gamma(
-        shape=[n],
-        alpha=expanded_concentration0,
-        dtype=self.dtype,
-        seed=distribution_util.gen_new_seed(seed, "beta"))
-    beta_sample = gamma1_sample / (gamma1_sample + gamma2_sample)
-    return beta_sample
-
-  @distribution_util.AppendDocstring(_beta_sample_note)
-  def _log_prob(self, x):
-    return self._log_unnormalized_prob(x) - self._log_normalization()
-
-  @distribution_util.AppendDocstring(_beta_sample_note)
-  def _prob(self, x):
-    return math_ops.exp(self._log_prob(x))
-
-  @distribution_util.AppendDocstring(_beta_sample_note)
-  def _log_cdf(self, x):
-    return math_ops.log(self._cdf(x))
-
-  @distribution_util.AppendDocstring(_beta_sample_note)
-  def _cdf(self, x):
-    return math_ops.betainc(self.concentration1, self.concentration0, x)
-
-  def _log_unnormalized_prob(self, x):
-    x = self._maybe_assert_valid_sample(x)
-    return ((self.concentration1 - 1.) * math_ops.log(x)
-            + (self.concentration0 - 1.) * math_ops.log1p(-x))
-
-  def _log_normalization(self):
-    return (math_ops.lgamma(self.concentration1)
-            + math_ops.lgamma(self.concentration0)
-            - math_ops.lgamma(self.total_concentration))
-
-  def _entropy(self):
-    return (
-        self._log_normalization()
-        - (self.concentration1 - 1.) * math_ops.digamma(self.concentration1)
-        - (self.concentration0 - 1.) * math_ops.digamma(self.concentration0)
-        + ((self.total_concentration - 2.) *
-           math_ops.digamma(self.total_concentration)))
-
-  def _mean(self):
-    return self._concentration1 / self._total_concentration
-
-  def _variance(self):
-    return self._mean() * (1. - self._mean()) / (1. + self.total_concentration)
-
-  @distribution_util.AppendDocstring(
-      """Note: The mode is undefined when `concentration1 <= 1` or
-      `concentration0 <= 1`. If `self.allow_nan_stats` is `True`, `NaN`
-      is used for undefined modes. If `self.allow_nan_stats` is `False` an
-      exception is raised when one or more modes are undefined.""")
-  def _mode(self):
-    mode = (self.concentration1 - 1.) / (self.total_concentration - 2.)
-    if self.allow_nan_stats:
-      nan = array_ops.fill(
-          self.batch_shape_tensor(),
-          np.array(np.nan, dtype=self.dtype.as_numpy_dtype()),
-          name="nan")
-      is_defined = math_ops.logical_and(self.concentration1 > 1.,
-                                        self.concentration0 > 1.)
-      return array_ops.where(is_defined, mode, nan)
-    return control_flow_ops.with_dependencies([
-        check_ops.assert_less(
-            array_ops.ones([], dtype=self.dtype),
-            self.concentration1,
-            message="Mode undefined for concentration1 <= 1."),
-        check_ops.assert_less(
-            array_ops.ones([], dtype=self.dtype),
-            self.concentration0,
-            message="Mode undefined for concentration0 <= 1.")
-    ], mode)
-
-  def _maybe_assert_valid_concentration(self, concentration, validate_args):
-    """Checks the validity of a concentration parameter."""
-    if not validate_args:
-      return concentration
-    return control_flow_ops.with_dependencies([
-        check_ops.assert_positive(
-            concentration,
-            message="Concentration parameter must be positive."),
-    ], concentration)
-
-  def _maybe_assert_valid_sample(self, x):
-    """Checks the validity of a sample."""
-    if not self.validate_args:
-      return x
-    return control_flow_ops.with_dependencies([
-        check_ops.assert_positive(
-            x,
-            message="sample must be positive"),
-        check_ops.assert_less(
-            x, array_ops.ones([], self.dtype),
-            message="sample must be no larger than `1`."),
-    ], x)
-
-
-class BetaWithSoftplusConcentration(Beta):
-  """Beta with softplus transform of `concentration1` and `concentration0`."""
-
-  def __init__(self,
-               concentration1,
-               concentration0,
-               validate_args=False,
-               allow_nan_stats=True,
-               name="BetaWithSoftplusConcentration"):
-    parameters = locals()
-    with ops.name_scope(name, values=[concentration1,
-                                      concentration0]) as ns:
-      super(BetaWithSoftplusConcentration, self).__init__(
-          concentration1=nn.softplus(concentration1,
-                                     name="softplus_concentration1"),
-          concentration0=nn.softplus(concentration0,
-                                     name="softplus_concentration0"),
-          validate_args=validate_args,
-          allow_nan_stats=allow_nan_stats,
-          name=ns)
-    self._parameters = parameters
-
-
-@kullback_leibler.RegisterKL(Beta, Beta)
-def _kl_beta_beta(d1, d2, name=None):
-  """Calculate the batchwise KL divergence KL(d1 || d2) with d1 and d2 Beta.
-
-  Args:
-    d1: instance of a Beta distribution object.
-    d2: instance of a Beta distribution object.
-    name: (optional) Name to use for created operations.
-      default is "kl_beta_beta".
-
-  Returns:
-    Batchwise KL(d1 || d2)
-  """
-  def delta(fn, is_property=True):
-    fn1 = getattr(d1, fn)
-    fn2 = getattr(d2, fn)
-    return (fn2 - fn1) if is_property else (fn2() - fn1())
-  with ops.name_scope(name, "kl_beta_beta", values=[
-      d1.concentration1,
-      d1.concentration0,
-      d1.total_concentration,
-      d2.concentration1,
-      d2.concentration0,
-      d2.total_concentration,
-  ]):
-    return (delta("_log_normalization", is_property=False)
-            - math_ops.digamma(d1.concentration1) * delta("concentration1")
-            - math_ops.digamma(d1.concentration0) * delta("concentration0")
-            + (math_ops.digamma(d1.total_concentration)
-               * delta("total_concentration")))
diff --git a/tensorflow/contrib/distributions/python/ops/bijectors/bijector_test_util.py b/tensorflow/contrib/distributions/python/ops/bijectors/bijector_test_util.py
index a083442332..ff3535c626 100644
--- a/tensorflow/contrib/distributions/python/ops/bijectors/bijector_test_util.py
+++ b/tensorflow/contrib/distributions/python/ops/bijectors/bijector_test_util.py
@@ -20,9 +20,9 @@ from __future__ import print_function
 
 import numpy as np
 
-from tensorflow.contrib.distributions.python.ops import uniform as uniform_lib
 from tensorflow.python.framework import ops
 from tensorflow.python.ops import math_ops
+from tensorflow.python.ops.distributions import uniform as uniform_lib
 
 
 def assert_finite(array):
diff --git a/tensorflow/contrib/distributions/python/ops/categorical.py b/tensorflow/contrib/distributions/python/ops/categorical.py
deleted file mode 100644
index 1b74c2f0ca..0000000000
--- a/tensorflow/contrib/distributions/python/ops/categorical.py
+++ /dev/null
@@ -1,242 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""The Categorical distribution class."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from tensorflow.python.framework import constant_op
-from tensorflow.python.framework import dtypes
-from tensorflow.python.framework import ops
-from tensorflow.python.framework import tensor_shape
-from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import math_ops
-from tensorflow.python.ops import nn_ops
-from tensorflow.python.ops import random_ops
-from tensorflow.python.ops.distributions import distribution
-from tensorflow.python.ops.distributions import kullback_leibler
-from tensorflow.python.ops.distributions import util as distribution_util
-
-
-class Categorical(distribution.Distribution):
-  """Categorical distribution.
-
-  The categorical distribution is parameterized by the log-probabilities
-  of a set of classes.
-
-  #### Examples
-
-  Creates a 3-class distribution, with the 2nd class, the most likely to be
-  drawn from.
-
-  ```python
-  p = [0.1, 0.5, 0.4]
-  dist = Categorical(probs=p)
-  ```
-
-  Creates a 3-class distribution, with the 2nd class the most likely to be
-  drawn from, using logits.
-
-  ```python
-  logits = [-50, 400, 40]
-  dist = Categorical(logits=logits)
-  ```
-
-  Creates a 3-class distribution, with the 3rd class is most likely to be drawn.
-  The distribution functions can be evaluated on counts.
-
-  ```python
-  # counts is a scalar.
-  p = [0.1, 0.4, 0.5]
-  dist = Categorical(probs=p)
-  dist.prob(0)  # Shape []
-
-  # p will be broadcast to [[0.1, 0.4, 0.5], [0.1, 0.4, 0.5]] to match counts.
-  counts = [1, 0]
-  dist.prob(counts)  # Shape [2]
-
-  # p will be broadcast to shape [3, 5, 7, 3] to match counts.
-  counts = [[...]] # Shape [5, 7, 3]
-  dist.prob(counts)  # Shape [5, 7, 3]
-  ```
-
-  """
-
-  def __init__(
-      self,
-      logits=None,
-      probs=None,
-      dtype=dtypes.int32,
-      validate_args=False,
-      allow_nan_stats=True,
-      name="Categorical"):
-    """Initialize Categorical distributions using class log-probabilities.
-
-    Args:
-      logits: An N-D `Tensor`, `N >= 1`, representing the log probabilities
-        of a set of Categorical distributions. The first `N - 1` dimensions
-        index into a batch of independent distributions and the last dimension
-        represents a vector of logits for each class. Only one of `logits` or
-        `probs` should be passed in.
-      probs: An N-D `Tensor`, `N >= 1`, representing the probabilities
-        of a set of Categorical distributions. The first `N - 1` dimensions
-        index into a batch of independent distributions and the last dimension
-        represents a vector of probabilities for each class. Only one of
-        `logits` or `probs` should be passed in.
-      dtype: The type of the event samples (default: int32).
-      validate_args: Python `bool`, default `False`. When `True` distribution
-        parameters are checked for validity despite possibly degrading runtime
-        performance. When `False` invalid inputs may silently render incorrect
-        outputs.
-      allow_nan_stats: Python `bool`, default `True`. When `True`, statistics
-        (e.g., mean, mode, variance) use the value "`NaN`" to indicate the
-        result is undefined. When `False`, an exception is raised if one or
-        more of the statistic's batch members are undefined.
-      name: Python `str` name prefixed to Ops created by this class.
-    """
-    parameters = locals()
-    with ops.name_scope(name, values=[logits, probs]):
-      self._logits, self._probs = distribution_util.get_logits_and_probs(
-          logits=logits,
-          probs=probs,
-          validate_args=validate_args,
-          multidimensional=True,
-          name=name)
-
-      logits_shape_static = self._logits.get_shape().with_rank_at_least(1)
-      if logits_shape_static.ndims is not None:
-        self._batch_rank = ops.convert_to_tensor(
-            logits_shape_static.ndims - 1,
-            dtype=dtypes.int32,
-            name="batch_rank")
-      else:
-        with ops.name_scope(name="batch_rank"):
-          self._batch_rank = array_ops.rank(self._logits) - 1
-
-      logits_shape = array_ops.shape(self._logits, name="logits_shape")
-      if logits_shape_static[-1].value is not None:
-        self._event_size = ops.convert_to_tensor(
-            logits_shape_static[-1].value,
-            dtype=dtypes.int32,
-            name="event_size")
-      else:
-        with ops.name_scope(name="event_size"):
-          self._event_size = logits_shape[self._batch_rank]
-
-      if logits_shape_static[:-1].is_fully_defined():
-        self._batch_shape_val = constant_op.constant(
-            logits_shape_static[:-1].as_list(),
-            dtype=dtypes.int32,
-            name="batch_shape")
-      else:
-        with ops.name_scope(name="batch_shape"):
-          self._batch_shape_val = logits_shape[:-1]
-    super(Categorical, self).__init__(
-        dtype=dtype,
-        reparameterization_type=distribution.NOT_REPARAMETERIZED,
-        validate_args=validate_args,
-        allow_nan_stats=allow_nan_stats,
-        parameters=parameters,
-        graph_parents=[self._logits,
-                       self._probs],
-        name=name)
-
-  @property
-  def event_size(self):
-    """Scalar `int32` tensor: the number of classes."""
-    return self._event_size
-
-  @property
-  def logits(self):
-    """Vector of coordinatewise logits."""
-    return self._logits
-
-  @property
-  def probs(self):
-    """Vector of coordinatewise probabilities."""
-    return self._probs
-
-  def _batch_shape_tensor(self):
-    return array_ops.identity(self._batch_shape_val)
-
-  def _batch_shape(self):
-    return self.logits.get_shape()[:-1]
-
-  def _event_shape_tensor(self):
-    return constant_op.constant([], dtype=dtypes.int32)
-
-  def _event_shape(self):
-    return tensor_shape.scalar()
-
-  def _sample_n(self, n, seed=None):
-    if self.logits.get_shape().ndims == 2:
-      logits_2d = self.logits
-    else:
-      logits_2d = array_ops.reshape(self.logits, [-1, self.event_size])
-    samples = random_ops.multinomial(logits_2d, n, seed=seed)
-    samples = math_ops.cast(samples, self.dtype)
-    ret = array_ops.reshape(
-        array_ops.transpose(samples),
-        array_ops.concat([[n], self.batch_shape_tensor()], 0))
-    return ret
-
-  def _log_prob(self, k):
-    k = ops.convert_to_tensor(k, name="k")
-    if self.logits.get_shape()[:-1] == k.get_shape():
-      logits = self.logits
-    else:
-      logits = self.logits * array_ops.ones_like(
-          array_ops.expand_dims(k, -1), dtype=self.logits.dtype)
-      logits_shape = array_ops.shape(logits)[:-1]
-      k *= array_ops.ones(logits_shape, dtype=k.dtype)
-      k.set_shape(tensor_shape.TensorShape(logits.get_shape()[:-1]))
-    return -nn_ops.sparse_softmax_cross_entropy_with_logits(labels=k,
-                                                            logits=logits)
-
-  def _prob(self, k):
-    return math_ops.exp(self._log_prob(k))
-
-  def _entropy(self):
-    return -math_ops.reduce_sum(
-        nn_ops.log_softmax(self.logits) * self.probs, axis=-1)
-
-  def _mode(self):
-    ret = math_ops.argmax(self.logits, dimension=self._batch_rank)
-    ret = math_ops.cast(ret, self.dtype)
-    ret.set_shape(self.batch_shape)
-    return ret
-
-
-@kullback_leibler.RegisterKL(Categorical, Categorical)
-def _kl_categorical_categorical(a, b, name=None):
-  """Calculate the batched KL divergence KL(a || b) with a and b Categorical.
-
-  Args:
-    a: instance of a Categorical distribution object.
-    b: instance of a Categorical distribution object.
-    name: (optional) Name to use for created operations.
-      default is "kl_categorical_categorical".
-
-  Returns:
-    Batchwise KL(a || b)
-  """
-  with ops.name_scope(name, "kl_categorical_categorical",
-                      values=[a.logits, b.logits]):
-    # sum(probs log(probs / (1 - probs)))
-    delta_log_probs1 = (nn_ops.log_softmax(a.logits) -
-                        nn_ops.log_softmax(b.logits))
-    return math_ops.reduce_sum(nn_ops.softmax(a.logits) * delta_log_probs1,
-                               axis=-1)
diff --git a/tensorflow/contrib/distributions/python/ops/chi2.py b/tensorflow/contrib/distributions/python/ops/chi2.py
index 45d3accdd6..bdd5571c96 100644
--- a/tensorflow/contrib/distributions/python/ops/chi2.py
+++ b/tensorflow/contrib/distributions/python/ops/chi2.py
@@ -18,11 +18,11 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
 
-from tensorflow.contrib.distributions.python.ops import gamma
 from tensorflow.python.framework import constant_op
 from tensorflow.python.framework import dtypes
 from tensorflow.python.framework import ops
 from tensorflow.python.ops import math_ops
+from tensorflow.python.ops.distributions import gamma
 
 
 __all__ = [
diff --git a/tensorflow/contrib/distributions/python/ops/dirichlet.py b/tensorflow/contrib/distributions/python/ops/dirichlet.py
deleted file mode 100644
index 923696a553..0000000000
--- a/tensorflow/contrib/distributions/python/ops/dirichlet.py
+++ /dev/null
@@ -1,297 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""The Dirichlet distribution class."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-
-from tensorflow.python.framework import ops
-from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import check_ops
-from tensorflow.python.ops import control_flow_ops
-from tensorflow.python.ops import math_ops
-from tensorflow.python.ops import random_ops
-from tensorflow.python.ops import special_math_ops
-from tensorflow.python.ops.distributions import distribution
-from tensorflow.python.ops.distributions import util as distribution_util
-
-
-__all__ = [
-    "Dirichlet",
-]
-
-
-_dirichlet_sample_note = """Note: `value` must be a non-negative tensor with
-dtype `self.dtype` and be in the `(self.event_shape() - 1)`-simplex, i.e.,
-`tf.reduce_sum(value, -1) = 1`. It must have a shape compatible with
-`self.batch_shape() + self.event_shape()`."""
-
-
-class Dirichlet(distribution.Distribution):
-  """Dirichlet distribution.
-
-  The Dirichlet distribution is defined over the
-  [`(k-1)`-simplex](https://en.wikipedia.org/wiki/Simplex) using a positive,
-  length-`k` vector `concentration` (`k > 1`). The Dirichlet is identically the
-  Beta distribution when `k = 2`.
-
-  #### Mathematical Details
-
-  The Dirichlet is a distribution over the open `(k-1)`-simplex, i.e.,
-
-  ```none
-  S^{k-1} = { (x_0, ..., x_{k-1}) in R^k : sum_j x_j = 1 and all_j x_j > 0 }.
-  ```
-
-  The probability density function (pdf) is,
-
-  ```none
-  pdf(x; alpha) = prod_j x_j**(alpha_j - 1) / Z
-  Z = prod_j Gamma(alpha_j) / Gamma(sum_j alpha_j)
-  ```
-
-  where:
-
-  * `x in S^{k-1}`, i.e., the `(k-1)`-simplex,
-  * `concentration = alpha = [alpha_0, ..., alpha_{k-1}]`, `alpha_j > 0`,
-  * `Z` is the normalization constant aka the [multivariate beta function](
-    https://en.wikipedia.org/wiki/Beta_function#Multivariate_beta_function),
-    and,
-  * `Gamma` is the [gamma function](
-    https://en.wikipedia.org/wiki/Gamma_function).
-
-  The `concentration` represents mean total counts of class occurrence, i.e.,
-
-  ```none
-  concentration = alpha = mean * total_concentration
-  ```
-
-  where `mean` in `S^{k-1}` and `total_concentration` is a positive real number
-  representing a mean total count.
-
-  Distribution parameters are automatically broadcast in all functions; see
-  examples for details.
-
-  #### Examples
-
-  ```python
-  # Create a single trivariate Dirichlet, with the 3rd class being three times
-  # more frequent than the first. I.e., batch_shape=[], event_shape=[3].
-  alpha = [1., 2, 3]
-  dist = Dirichlet(alpha)
-
-  dist.sample([4, 5])  # shape: [4, 5, 3]
-
-  # x has one sample, one batch, three classes:
-  x = [.2, .3, .5]   # shape: [3]
-  dist.prob(x)       # shape: []
-
-  # x has two samples from one batch:
-  x = [[.1, .4, .5],
-       [.2, .3, .5]]
-  dist.prob(x)         # shape: [2]
-
-  # alpha will be broadcast to shape [5, 7, 3] to match x.
-  x = [[...]]   # shape: [5, 7, 3]
-  dist.prob(x)  # shape: [5, 7]
-  ```
-
-  ```python
-  # Create batch_shape=[2], event_shape=[3]:
-  alpha = [[1., 2, 3],
-           [4, 5, 6]]   # shape: [2, 3]
-  dist = Dirichlet(alpha)
-
-  dist.sample([4, 5])  # shape: [4, 5, 2, 3]
-
-  x = [.2, .3, .5]
-  # x will be broadcast as [[.2, .3, .5],
-  #                         [.2, .3, .5]],
-  # thus matching batch_shape [2, 3].
-  dist.prob(x)         # shape: [2]
-  ```
-
-  """
-
-  def __init__(self,
-               concentration,
-               validate_args=False,
-               allow_nan_stats=True,
-               name="Dirichlet"):
-    """Initialize a batch of Dirichlet distributions.
-
-    Args:
-      concentration: Positive floating-point `Tensor` indicating mean number
-        of class occurrences; aka "alpha". Implies `self.dtype`, and
-        `self.batch_shape`, `self.event_shape`, i.e., if
-        `concentration.shape = [N1, N2, ..., Nm, k]` then
-        `batch_shape = [N1, N2, ..., Nm]` and
-        `event_shape = [k]`.
-      validate_args: Python `bool`, default `False`. When `True` distribution
-        parameters are checked for validity despite possibly degrading runtime
-        performance. When `False` invalid inputs may silently render incorrect
-        outputs.
-      allow_nan_stats: Python `bool`, default `True`. When `True`, statistics
-        (e.g., mean, mode, variance) use the value "`NaN`" to indicate the
-        result is undefined. When `False`, an exception is raised if one or
-        more of the statistic's batch members are undefined.
-      name: Python `str` name prefixed to Ops created by this class.
-    """
-    parameters = locals()
-    with ops.name_scope(name, values=[concentration]):
-      self._concentration = self._maybe_assert_valid_concentration(
-          ops.convert_to_tensor(concentration, name="concentration"),
-          validate_args)
-      self._total_concentration = math_ops.reduce_sum(self._concentration, -1)
-    super(Dirichlet, self).__init__(
-        dtype=self._concentration.dtype,
-        validate_args=validate_args,
-        allow_nan_stats=allow_nan_stats,
-        reparameterization_type=distribution.NOT_REPARAMETERIZED,
-        parameters=parameters,
-        graph_parents=[self._concentration,
-                       self._total_concentration],
-        name=name)
-
-  @property
-  def concentration(self):
-    """Concentration parameter; expected counts for that coordinate."""
-    return self._concentration
-
-  @property
-  def total_concentration(self):
-    """Sum of last dim of concentration parameter."""
-    return self._total_concentration
-
-  def _batch_shape_tensor(self):
-    return array_ops.shape(self.total_concentration)
-
-  def _batch_shape(self):
-    return self.total_concentration.get_shape()
-
-  def _event_shape_tensor(self):
-    return array_ops.shape(self.concentration)[-1:]
-
-  def _event_shape(self):
-    return self.concentration.get_shape().with_rank_at_least(1)[-1:]
-
-  def _sample_n(self, n, seed=None):
-    gamma_sample = random_ops.random_gamma(
-        shape=[n],
-        alpha=self.concentration,
-        dtype=self.dtype,
-        seed=seed)
-    return gamma_sample / math_ops.reduce_sum(gamma_sample, -1, keep_dims=True)
-
-  @distribution_util.AppendDocstring(_dirichlet_sample_note)
-  def _log_prob(self, x):
-    return self._log_unnormalized_prob(x) - self._log_normalization()
-
-  @distribution_util.AppendDocstring(_dirichlet_sample_note)
-  def _prob(self, x):
-    return math_ops.exp(self._log_prob(x))
-
-  def _log_unnormalized_prob(self, x):
-    x = self._maybe_assert_valid_sample(x)
-    return math_ops.reduce_sum((self.concentration - 1.) * math_ops.log(x), -1)
-
-  def _log_normalization(self):
-    return special_math_ops.lbeta(self.concentration)
-
-  def _entropy(self):
-    k = math_ops.cast(self.event_shape_tensor()[0], self.dtype)
-    return (
-        self._log_normalization()
-        + ((self.total_concentration - k)
-           * math_ops.digamma(self.total_concentration))
-        - math_ops.reduce_sum(
-            (self.concentration - 1.) * math_ops.digamma(self.concentration),
-            axis=-1))
-
-  def _mean(self):
-    return self.concentration / self.total_concentration[..., array_ops.newaxis]
-
-  def _covariance(self):
-    x = self._variance_scale_term() * self._mean()
-    return array_ops.matrix_set_diag(
-        -math_ops.matmul(x[..., array_ops.newaxis],
-                         x[..., array_ops.newaxis, :]),  # outer prod
-        self._variance())
-
-  def _variance(self):
-    scale = self._variance_scale_term()
-    x = scale * self._mean()
-    return x * (scale - x)
-
-  def _variance_scale_term(self):
-    """Helper to `_covariance` and `_variance` which computes a shared scale."""
-    return math_ops.rsqrt(1. + self.total_concentration[..., array_ops.newaxis])
-
-  @distribution_util.AppendDocstring(
-      """Note: The mode is undefined when any `concentration <= 1`. If
-      `self.allow_nan_stats` is `True`, `NaN` is used for undefined modes. If
-      `self.allow_nan_stats` is `False` an exception is raised when one or more
-      modes are undefined.""")
-  def _mode(self):
-    k = math_ops.cast(self.event_shape_tensor()[0], self.dtype)
-    mode = (self.concentration - 1.) / (
-        self.total_concentration[..., array_ops.newaxis] - k)
-    if self.allow_nan_stats:
-      nan = array_ops.fill(
-          array_ops.shape(mode),
-          np.array(np.nan, dtype=self.dtype.as_numpy_dtype()),
-          name="nan")
-      return array_ops.where(
-          math_ops.reduce_all(self.concentration > 1., axis=-1),
-          mode, nan)
-    return control_flow_ops.with_dependencies([
-        check_ops.assert_less(
-            array_ops.ones([], self.dtype),
-            self.concentration,
-            message="Mode undefined when any concentration <= 1"),
-    ], mode)
-
-  def _maybe_assert_valid_concentration(self, concentration, validate_args):
-    """Checks the validity of the concentration parameter."""
-    if not validate_args:
-      return concentration
-    return control_flow_ops.with_dependencies([
-        check_ops.assert_positive(
-            concentration,
-            message="Concentration parameter must be positive."),
-        check_ops.assert_rank_at_least(
-            concentration, 1,
-            message="Concentration parameter must have >=1 dimensions."),
-        check_ops.assert_less(
-            1, array_ops.shape(concentration)[-1],
-            message="Concentration parameter must have event_size >= 2."),
-    ], concentration)
-
-  def _maybe_assert_valid_sample(self, x):
-    """Checks the validity of a sample."""
-    if not self.validate_args:
-      return x
-    return control_flow_ops.with_dependencies([
-        check_ops.assert_positive(
-            x,
-            message="samples must be positive"),
-        distribution_util.assert_close(
-            array_ops.ones([], dtype=self.dtype),
-            math_ops.reduce_sum(x, -1),
-            message="sample last-dimension must sum to `1`"),
-    ], x)
diff --git a/tensorflow/contrib/distributions/python/ops/dirichlet_multinomial.py b/tensorflow/contrib/distributions/python/ops/dirichlet_multinomial.py
deleted file mode 100644
index 662a765558..0000000000
--- a/tensorflow/contrib/distributions/python/ops/dirichlet_multinomial.py
+++ /dev/null
@@ -1,343 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""The DirichletMultinomial distribution class."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from tensorflow.python.framework import dtypes
-from tensorflow.python.framework import ops
-from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import check_ops
-from tensorflow.python.ops import control_flow_ops
-from tensorflow.python.ops import math_ops
-from tensorflow.python.ops import random_ops
-from tensorflow.python.ops import special_math_ops
-from tensorflow.python.ops.distributions import distribution
-from tensorflow.python.ops.distributions import util as distribution_util
-
-
-__all__ = [
-    "DirichletMultinomial",
-]
-
-
-_dirichlet_multinomial_sample_note = """For each batch of counts,
-`value = [n_0, ..., n_{k-1}]`, `P[value]` is the probability that after
-sampling `self.total_count` draws from this Dirichlet-Multinomial distribution,
-the number of draws falling in class `j` is `n_j`. Since this definition is
-[exchangeable](https://en.wikipedia.org/wiki/Exchangeable_random_variables);
-different sequences have the same counts so the probability includes a
-combinatorial coefficient.
-
-Note: `value` must be a non-negative tensor with dtype `self.dtype`, have no
-fractional components, and such that
-`tf.reduce_sum(value, -1) = self.total_count`. Its shape must be broadcastable
-with `self.concentration` and `self.total_count`."""
-
-
-class DirichletMultinomial(distribution.Distribution):
-  """Dirichlet-Multinomial compound distribution.
-
-  The Dirichlet-Multinomial distribution is parameterized by a (batch of)
-  length-`k` `concentration` vectors (`k > 1`) and a `total_count` number of
-  trials, i.e., the number of trials per draw from the DirichletMultinomial. It
-  is defined over a (batch of) length-`k` vector `counts` such that
-  `tf.reduce_sum(counts, -1) = total_count`. The Dirichlet-Multinomial is
-  identically the Beta-Binomial distribution when `k = 2`.
-
-  #### Mathematical Details
-
-  The Dirichlet-Multinomial is a distribution over `k`-class counts, i.e., a
-  length-`k` vector of non-negative integer `counts = n = [n_0, ..., n_{k-1}]`.
-
-  The probability mass function (pmf) is,
-
-  ```none
-  pmf(n; alpha, N) = Beta(alpha + n) / (prod_j n_j!) / Z
-  Z = Beta(alpha) / N!
-  ```
-
-  where:
-
-  * `concentration = alpha = [alpha_0, ..., alpha_{k-1}]`, `alpha_j > 0`,
-  * `total_count = N`, `N` a positive integer,
-  * `N!` is `N` factorial, and,
-  * `Beta(x) = prod_j Gamma(x_j) / Gamma(sum_j x_j)` is the
-    [multivariate beta function](
-    https://en.wikipedia.org/wiki/Beta_function#Multivariate_beta_function),
-    and,
-  * `Gamma` is the [gamma function](
-    https://en.wikipedia.org/wiki/Gamma_function).
-
-  Dirichlet-Multinomial is a [compound distribution](
-  https://en.wikipedia.org/wiki/Compound_probability_distribution), i.e., its
-  samples are generated as follows.
-
-    1. Choose class probabilities:
-       `probs = [p_0,...,p_{k-1}] ~ Dir(concentration)`
-    2. Draw integers:
-       `counts = [n_0,...,n_{k-1}] ~ Multinomial(total_count, probs)`
-
-  The last `concentration` dimension parametrizes a single Dirichlet-Multinomial
-  distribution. When calling distribution functions (e.g., `dist.prob(counts)`),
-  `concentration`, `total_count` and `counts` are broadcast to the same shape.
-  The last dimension of of `counts` corresponds single Dirichlet-Multinomial
-  distributions.
-
-  Distribution parameters are automatically broadcast in all functions; see
-  examples for details.
-
-  #### Examples
-
-  ```python
-  alpha = [1, 2, 3]
-  n = 2
-  dist = DirichletMultinomial(n, alpha)
-  ```
-
-  Creates a 3-class distribution, with the 3rd class is most likely to be drawn.
-  The distribution functions can be evaluated on counts.
-
-  ```python
-  # counts same shape as alpha.
-  counts = [0, 0, 2]
-  dist.prob(counts)  # Shape []
-
-  # alpha will be broadcast to [[1, 2, 3], [1, 2, 3]] to match counts.
-  counts = [[1, 1, 0], [1, 0, 1]]
-  dist.prob(counts)  # Shape [2]
-
-  # alpha will be broadcast to shape [5, 7, 3] to match counts.
-  counts = [[...]]  # Shape [5, 7, 3]
-  dist.prob(counts)  # Shape [5, 7]
-  ```
-
-  Creates a 2-batch of 3-class distributions.
-
-  ```python
-  alpha = [[1, 2, 3], [4, 5, 6]]  # Shape [2, 3]
-  n = [3, 3]
-  dist = DirichletMultinomial(n, alpha)
-
-  # counts will be broadcast to [[2, 1, 0], [2, 1, 0]] to match alpha.
-  counts = [2, 1, 0]
-  dist.prob(counts)  # Shape [2]
-  ```
-
-  """
-
-  # TODO(b/27419586) Change docstring for dtype of concentration once int
-  # allowed.
-  def __init__(self,
-               total_count,
-               concentration,
-               validate_args=False,
-               allow_nan_stats=True,
-               name="DirichletMultinomial"):
-    """Initialize a batch of DirichletMultinomial distributions.
-
-    Args:
-      total_count:  Non-negative floating point tensor, whose dtype is the same
-        as `concentration`. The shape is broadcastable to `[N1,..., Nm]` with
-        `m >= 0`. Defines this as a batch of `N1 x ... x Nm` different
-        Dirichlet multinomial distributions. Its components should be equal to
-        integer values.
-      concentration: Positive floating point tensor, whose dtype is the
-        same as `n` with shape broadcastable to `[N1,..., Nm, k]` `m >= 0`.
-        Defines this as a batch of `N1 x ... x Nm` different `k` class Dirichlet
-        multinomial distributions.
-      validate_args: Python `bool`, default `False`. When `True` distribution
-        parameters are checked for validity despite possibly degrading runtime
-        performance. When `False` invalid inputs may silently render incorrect
-        outputs.
-      allow_nan_stats: Python `bool`, default `True`. When `True`, statistics
-        (e.g., mean, mode, variance) use the value "`NaN`" to indicate the
-        result is undefined. When `False`, an exception is raised if one or
-        more of the statistic's batch members are undefined.
-      name: Python `str` name prefixed to Ops created by this class.
-    """
-    parameters = locals()
-    with ops.name_scope(name, values=[total_count, concentration]):
-      # Broadcasting works because:
-      # * The broadcasting convention is to prepend dimensions of size [1], and
-      #   we use the last dimension for the distribution, whereas
-      #   the batch dimensions are the leading dimensions, which forces the
-      #   distribution dimension to be defined explicitly (i.e. it cannot be
-      #   created automatically by prepending). This forces enough explicitness.
-      # * All calls involving `counts` eventually require a broadcast between
-      #  `counts` and concentration.
-      self._total_count = self._maybe_assert_valid_total_count(
-          ops.convert_to_tensor(total_count, name="total_count"),
-          validate_args)
-      self._concentration = self._maybe_assert_valid_concentration(
-          ops.convert_to_tensor(concentration,
-                                name="concentration"),
-          validate_args)
-      self._total_concentration = math_ops.reduce_sum(self._concentration, -1)
-    super(DirichletMultinomial, self).__init__(
-        dtype=self._concentration.dtype,
-        validate_args=validate_args,
-        allow_nan_stats=allow_nan_stats,
-        reparameterization_type=distribution.NOT_REPARAMETERIZED,
-        parameters=parameters,
-        graph_parents=[self._total_count,
-                       self._concentration],
-        name=name)
-
-  @property
-  def total_count(self):
-    """Number of trials used to construct a sample."""
-    return self._total_count
-
-  @property
-  def concentration(self):
-    """Concentration parameter; expected prior counts for that coordinate."""
-    return self._concentration
-
-  @property
-  def total_concentration(self):
-    """Sum of last dim of concentration parameter."""
-    return self._total_concentration
-
-  def _batch_shape_tensor(self):
-    return array_ops.shape(self.total_concentration)
-
-  def _batch_shape(self):
-    return self.total_concentration.get_shape()
-
-  def _event_shape_tensor(self):
-    return array_ops.shape(self.concentration)[-1:]
-
-  def _event_shape(self):
-    # Event shape depends only on total_concentration, not "n".
-    return self.concentration.get_shape().with_rank_at_least(1)[-1:]
-
-  def _sample_n(self, n, seed=None):
-    n_draws = math_ops.cast(self.total_count, dtype=dtypes.int32)
-    k = self.event_shape_tensor()[0]
-    unnormalized_logits = array_ops.reshape(
-        math_ops.log(random_ops.random_gamma(
-            shape=[n],
-            alpha=self.concentration,
-            dtype=self.dtype,
-            seed=seed)),
-        shape=[-1, k])
-    draws = random_ops.multinomial(
-        logits=unnormalized_logits,
-        num_samples=n_draws,
-        seed=distribution_util.gen_new_seed(seed, salt="dirichlet_multinomial"))
-    x = math_ops.reduce_sum(array_ops.one_hot(draws, depth=k), -2)
-    final_shape = array_ops.concat([[n], self.batch_shape_tensor(), [k]], 0)
-    return array_ops.reshape(x, final_shape)
-
-  @distribution_util.AppendDocstring(_dirichlet_multinomial_sample_note)
-  def _log_prob(self, counts):
-    counts = self._maybe_assert_valid_sample(counts)
-    ordered_prob = (
-        special_math_ops.lbeta(self.concentration + counts)
-        - special_math_ops.lbeta(self.concentration))
-    return ordered_prob + distribution_util.log_combinations(
-        self.total_count, counts)
-
-  @distribution_util.AppendDocstring(_dirichlet_multinomial_sample_note)
-  def _prob(self, counts):
-    return math_ops.exp(self._log_prob(counts))
-
-  def _mean(self):
-    return self.total_count * (self.concentration /
-                               self.total_concentration[..., array_ops.newaxis])
-
-  @distribution_util.AppendDocstring(
-      """The covariance for each batch member is defined as the following:
-
-      ```none
-      Var(X_j) = n * alpha_j / alpha_0 * (1 - alpha_j / alpha_0) *
-      (n + alpha_0) / (1 + alpha_0)
-      ```
-
-      where `concentration = alpha` and
-      `total_concentration = alpha_0 = sum_j alpha_j`.
-
-      The covariance between elements in a batch is defined as:
-
-      ```none
-      Cov(X_i, X_j) = -n * alpha_i * alpha_j / alpha_0 ** 2 *
-      (n + alpha_0) / (1 + alpha_0)
-      ```
-      """)
-  def _covariance(self):
-    x = self._variance_scale_term() * self._mean()
-    return array_ops.matrix_set_diag(
-        -math_ops.matmul(x[..., array_ops.newaxis],
-                         x[..., array_ops.newaxis, :]),  # outer prod
-        self._variance())
-
-  def _variance(self):
-    scale = self._variance_scale_term()
-    x = scale * self._mean()
-    return x * (self.total_count * scale - x)
-
-  def _variance_scale_term(self):
-    """Helper to `_covariance` and `_variance` which computes a shared scale."""
-    # We must take care to expand back the last dim whenever we use the
-    # total_concentration.
-    c0 = self.total_concentration[..., array_ops.newaxis]
-    return math_ops.sqrt((1. + c0 / self.total_count) / (1. + c0))
-
-  def _maybe_assert_valid_concentration(self, concentration, validate_args):
-    """Checks the validity of the concentration parameter."""
-    if not validate_args:
-      return concentration
-    return control_flow_ops.with_dependencies([
-        check_ops.assert_positive(
-            concentration,
-            message="Concentration parameter must be positive."),
-        check_ops.assert_rank_at_least(
-            concentration, 1,
-            message="Concentration parameter must have >=1 dimensions."),
-        check_ops.assert_less(
-            1, array_ops.shape(concentration)[-1],
-            message="Concentration parameter must have event_size >= 2."),
-    ], concentration)
-
-  def _maybe_assert_valid_total_count(self, total_count, validate_args):
-    if not validate_args:
-      return total_count
-    return control_flow_ops.with_dependencies([
-        check_ops.assert_non_negative(
-            total_count,
-            message="total_count must be non-negative."),
-        distribution_util.assert_integer_form(
-            total_count,
-            message="total_count cannot contain fractional values."),
-    ], total_count)
-
-  def _maybe_assert_valid_sample(self, counts):
-    """Check counts for proper shape, values, then return tensor version."""
-    if not self.validate_args:
-      return counts
-    return control_flow_ops.with_dependencies([
-        check_ops.assert_non_negative(
-            counts,
-            message="counts must be non-negative."),
-        check_ops.assert_equal(
-            self.total_count, math_ops.reduce_sum(counts, -1),
-            message="counts last-dimension must sum to `self.total_count`"),
-        distribution_util.assert_integer_form(
-            counts,
-            message="counts cannot contain fractional components."),
-    ], counts)
diff --git a/tensorflow/contrib/distributions/python/ops/exponential.py b/tensorflow/contrib/distributions/python/ops/exponential.py
deleted file mode 100644
index a293d1e0dc..0000000000
--- a/tensorflow/contrib/distributions/python/ops/exponential.py
+++ /dev/null
@@ -1,151 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""The Exponential distribution class."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-
-from tensorflow.contrib.distributions.python.ops import gamma
-from tensorflow.python.framework import dtypes
-from tensorflow.python.framework import ops
-from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import math_ops
-from tensorflow.python.ops import nn
-from tensorflow.python.ops import random_ops
-
-
-__all__ = [
-    "Exponential",
-    "ExponentialWithSoftplusRate",
-]
-
-
-class Exponential(gamma.Gamma):
-  """Exponential distribution.
-
-  The Exponential distribution is parameterized by an event `rate` parameter.
-
-  #### Mathematical Details
-
-  The probability density function (pdf) is,
-
-  ```none
-  pdf(x; lambda, x > 0) = exp(-lambda x) / Z
-  Z = 1 / lambda
-  ```
-
-  where `rate = lambda` and `Z` is the normalizaing constant.
-
-  The Exponential distribution is a special case of the Gamma distribution,
-  i.e.,
-
-  ```python
-  Exponential(rate) = Gamma(concentration=1., rate)
-  ```
-
-  The Exponential distribution uses a `rate` parameter, or "inverse scale",
-  which can be intuited as,
-
-  ```none
-  X ~ Exponential(rate=1)
-  Y = X / rate
-  ```
-
-  """
-
-  def __init__(self,
-               rate,
-               validate_args=False,
-               allow_nan_stats=True,
-               name="Exponential"):
-    """Construct Exponential distribution with parameter `rate`.
-
-    Args:
-      rate: Floating point tensor, equivalent to `1 / mean`. Must contain only
-        positive values.
-      validate_args: Python `bool`, default `False`. When `True` distribution
-        parameters are checked for validity despite possibly degrading runtime
-        performance. When `False` invalid inputs may silently render incorrect
-        outputs.
-      allow_nan_stats: Python `bool`, default `True`. When `True`, statistics
-        (e.g., mean, mode, variance) use the value "`NaN`" to indicate the
-        result is undefined. When `False`, an exception is raised if one or
-        more of the statistic's batch members are undefined.
-      name: Python `str` name prefixed to Ops created by this class.
-    """
-    parameters = locals()
-    # Even though all statistics of are defined for valid inputs, this is not
-    # true in the parent class "Gamma."  Therefore, passing
-    # allow_nan_stats=True
-    # through to the parent class results in unnecessary asserts.
-    with ops.name_scope(name, values=[rate]):
-      self._rate = ops.convert_to_tensor(rate, name="rate")
-    super(Exponential, self).__init__(
-        concentration=array_ops.ones([], dtype=self._rate.dtype),
-        rate=self._rate,
-        allow_nan_stats=allow_nan_stats,
-        validate_args=validate_args,
-        name=name)
-    # While the Gamma distribution is not reparameterizable, the exponential
-    # distribution is.
-    self._reparameterization_type = True
-    self._parameters = parameters
-    self._graph_parents += [self._rate]
-
-  @staticmethod
-  def _param_shapes(sample_shape):
-    return {"rate": ops.convert_to_tensor(sample_shape, dtype=dtypes.int32)}
-
-  @property
-  def rate(self):
-    return self._rate
-
-  def _sample_n(self, n, seed=None):
-    shape = array_ops.concat([[n], array_ops.shape(self._rate)], 0)
-    # Uniform variates must be sampled from the open-interval `(0, 1)` rather
-    # than `[0, 1)`. To do so, we use `np.finfo(self.dtype.as_numpy_dtype).tiny`
-    # because it is the smallest, positive, "normal" number. A "normal" number
-    # is such that the mantissa has an implicit leading 1. Normal, positive
-    # numbers x, y have the reasonable property that, `x + y >= max(x, y)`. In
-    # this case, a subnormal number (i.e., np.nextafter) can cause us to sample
-    # 0.
-    sampled = random_ops.random_uniform(
-        shape,
-        minval=np.finfo(self.dtype.as_numpy_dtype).tiny,
-        maxval=1.,
-        seed=seed,
-        dtype=self.dtype)
-    return -math_ops.log(sampled) / self._rate
-
-
-class ExponentialWithSoftplusRate(Exponential):
-  """Exponential with softplus transform on `rate`."""
-
-  def __init__(self,
-               rate,
-               validate_args=False,
-               allow_nan_stats=True,
-               name="ExponentialWithSoftplusRate"):
-    parameters = locals()
-    with ops.name_scope(name, values=[rate]):
-      super(ExponentialWithSoftplusRate, self).__init__(
-          rate=nn.softplus(rate, name="softplus_rate"),
-          validate_args=validate_args,
-          allow_nan_stats=allow_nan_stats,
-          name=name)
-    self._parameters = parameters
diff --git a/tensorflow/contrib/distributions/python/ops/gamma.py b/tensorflow/contrib/distributions/python/ops/gamma.py
deleted file mode 100644
index 4ac2b9b4ef..0000000000
--- a/tensorflow/contrib/distributions/python/ops/gamma.py
+++ /dev/null
@@ -1,305 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""The Gamma distribution class."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-
-from tensorflow.python.framework import constant_op
-from tensorflow.python.framework import dtypes
-from tensorflow.python.framework import ops
-from tensorflow.python.framework import tensor_shape
-from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import check_ops
-from tensorflow.python.ops import control_flow_ops
-from tensorflow.python.ops import math_ops
-from tensorflow.python.ops import nn
-from tensorflow.python.ops import random_ops
-from tensorflow.python.ops.distributions import distribution
-from tensorflow.python.ops.distributions import kullback_leibler
-from tensorflow.python.ops.distributions import util as distribution_util
-
-
-__all__ = [
-    "Gamma",
-    "GammaWithSoftplusConcentrationRate",
-]
-
-
-class Gamma(distribution.Distribution):
-  """Gamma distribution.
-
-  The Gamma distribution is defined over positive real numbers using
-  parameters `concentration` (aka "alpha") and `rate` (aka "beta").
-
-  #### Mathematical Details
-
-  The probability density function (pdf) is,
-
-  ```none
-  pdf(x; alpha, beta, x > 0) = x**(alpha - 1) exp(-x beta) / Z
-  Z = Gamma(alpha) beta**alpha
-  ```
-
-  where:
-
-  * `concentration = alpha`, `alpha > 0`,
-  * `rate = beta`, `beta > 0`,
-  * `Z` is the normalizing constant, and,
-  * `Gamma` is the [gamma function](
-    https://en.wikipedia.org/wiki/Gamma_function).
-
-  The cumulative density function (cdf) is,
-
-  ```none
-  cdf(x; alpha, beta, x > 0) = GammaInc(alpha, beta x) / Gamma(alpha)
-  ```
-
-  where `GammaInc` is the [lower incomplete Gamma function](
-  https://en.wikipedia.org/wiki/Incomplete_gamma_function).
-
-  The parameters can be intuited via their relationship to mean and stddev,
-
-  ```none
-  concentration = alpha = (mean / stddev)**2
-  rate = beta = mean / stddev**2 = concentration / mean
-  ```
-
-  Distribution parameters are automatically broadcast in all functions; see
-  examples for details.
-
-  WARNING: This distribution may draw 0-valued samples for small `concentration`
-  values. See note in `tf.random_gamma` docstring.
-
-  #### Examples
-
-  ```python
-  dist = Gamma(concentration=3.0, rate=2.0)
-  dist2 = Gamma(concentration=[3.0, 4.0], rate=[2.0, 3.0])
-  ```
-
-  """
-
-  def __init__(self,
-               concentration,
-               rate,
-               validate_args=False,
-               allow_nan_stats=True,
-               name="Gamma"):
-    """Construct Gamma with `concentration` and `rate` parameters.
-
-    The parameters `concentration` and `rate` must be shaped in a way that
-    supports broadcasting (e.g. `concentration + rate` is a valid operation).
-
-    Args:
-      concentration: Floating point tensor, the concentration params of the
-        distribution(s). Must contain only positive values.
-      rate: Floating point tensor, the inverse scale params of the
-        distribution(s). Must contain only positive values.
-      validate_args: Python `bool`, default `False`. When `True` distribution
-        parameters are checked for validity despite possibly degrading runtime
-        performance. When `False` invalid inputs may silently render incorrect
-        outputs.
-      allow_nan_stats: Python `bool`, default `True`. When `True`, statistics
-        (e.g., mean, mode, variance) use the value "`NaN`" to indicate the
-        result is undefined. When `False`, an exception is raised if one or
-        more of the statistic's batch members are undefined.
-      name: Python `str` name prefixed to Ops created by this class.
-
-    Raises:
-      TypeError: if `concentration` and `rate` are different dtypes.
-    """
-    parameters = locals()
-    with ops.name_scope(name, values=[concentration, rate]):
-      with ops.control_dependencies([
-          check_ops.assert_positive(concentration),
-          check_ops.assert_positive(rate),
-      ] if validate_args else []):
-        self._concentration = array_ops.identity(
-            concentration, name="concentration")
-        self._rate = array_ops.identity(rate, name="rate")
-        check_ops.assert_same_float_dtype(
-            [self._concentration, self._rate])
-    super(Gamma, self).__init__(
-        dtype=self._concentration.dtype,
-        validate_args=validate_args,
-        allow_nan_stats=allow_nan_stats,
-        reparameterization_type=distribution.NOT_REPARAMETERIZED,
-        parameters=parameters,
-        graph_parents=[self._concentration,
-                       self._rate],
-        name=name)
-
-  @staticmethod
-  def _param_shapes(sample_shape):
-    return dict(
-        zip(("concentration", "rate"), ([ops.convert_to_tensor(
-            sample_shape, dtype=dtypes.int32)] * 2)))
-
-  @property
-  def concentration(self):
-    """Concentration parameter."""
-    return self._concentration
-
-  @property
-  def rate(self):
-    """Rate parameter."""
-    return self._rate
-
-  def _batch_shape_tensor(self):
-    return array_ops.broadcast_dynamic_shape(
-        array_ops.shape(self.concentration),
-        array_ops.shape(self.rate))
-
-  def _batch_shape(self):
-    return array_ops.broadcast_static_shape(
-        self.concentration.get_shape(),
-        self.rate.get_shape())
-
-  def _event_shape_tensor(self):
-    return constant_op.constant([], dtype=dtypes.int32)
-
-  def _event_shape(self):
-    return tensor_shape.scalar()
-
-  @distribution_util.AppendDocstring(
-      """Note: See `tf.random_gamma` docstring for sampling details and
-      caveats.""")
-  def _sample_n(self, n, seed=None):
-    return random_ops.random_gamma(
-        shape=[n],
-        alpha=self.concentration,
-        beta=self.rate,
-        dtype=self.dtype,
-        seed=seed)
-
-  def _log_prob(self, x):
-    return self._log_unnormalized_prob(x) - self._log_normalization()
-
-  def _prob(self, x):
-    return math_ops.exp(self._log_prob(x))
-
-  def _log_cdf(self, x):
-    return math_ops.log(self._cdf(x))
-
-  def _cdf(self, x):
-    x = self._maybe_assert_valid_sample(x)
-    # Note that igamma returns the regularized incomplete gamma function,
-    # which is what we want for the CDF.
-    return math_ops.igamma(self.concentration, self.rate * x)
-
-  def _log_unnormalized_prob(self, x):
-    x = self._maybe_assert_valid_sample(x)
-    return (self.concentration - 1.) * math_ops.log(x) - self.rate * x
-
-  def _log_normalization(self):
-    return (math_ops.lgamma(self.concentration)
-            - self.concentration * math_ops.log(self.rate))
-
-  def _entropy(self):
-    return (self.concentration
-            - math_ops.log(self.rate)
-            + math_ops.lgamma(self.concentration)
-            + ((1. - self.concentration) *
-               math_ops.digamma(self.concentration)))
-
-  def _mean(self):
-    return self.concentration / self.rate
-
-  def _variance(self):
-    return self.concentration / math_ops.square(self.rate)
-
-  def _stddev(self):
-    return math_ops.sqrt(self.concentration) / self.rate
-
-  @distribution_util.AppendDocstring(
-      """The mode of a gamma distribution is `(shape - 1) / rate` when
-      `shape > 1`, and `NaN` otherwise. If `self.allow_nan_stats` is `False`,
-      an exception will be raised rather than returning `NaN`.""")
-  def _mode(self):
-    mode = (self.concentration - 1.) / self.rate
-    if self.allow_nan_stats:
-      nan = array_ops.fill(
-          self.batch_shape_tensor(),
-          np.array(np.nan, dtype=self.dtype.as_numpy_dtype()),
-          name="nan")
-      return array_ops.where(self.concentration > 1., mode, nan)
-    else:
-      return control_flow_ops.with_dependencies([
-          check_ops.assert_less(
-              array_ops.ones([], self.dtype),
-              self.concentration,
-              message="mode not defined when any concentration <= 1"),
-          ], mode)
-
-  def _maybe_assert_valid_sample(self, x):
-    check_ops.assert_same_float_dtype(tensors=[x], dtype=self.dtype)
-    if not self.validate_args:
-      return x
-    return control_flow_ops.with_dependencies([
-        check_ops.assert_positive(x),
-    ], x)
-
-
-class GammaWithSoftplusConcentrationRate(Gamma):
-  """`Gamma` with softplus of `concentration` and `rate`."""
-
-  def __init__(self,
-               concentration,
-               rate,
-               validate_args=False,
-               allow_nan_stats=True,
-               name="GammaWithSoftplusConcentrationRate"):
-    parameters = locals()
-    with ops.name_scope(name, values=[concentration, rate]):
-      super(GammaWithSoftplusConcentrationRate, self).__init__(
-          concentration=nn.softplus(concentration,
-                                    name="softplus_concentration"),
-          rate=nn.softplus(rate, name="softplus_rate"),
-          validate_args=validate_args,
-          allow_nan_stats=allow_nan_stats,
-          name=name)
-    self._parameters = parameters
-
-
-@kullback_leibler.RegisterKL(Gamma, Gamma)
-def _kl_gamma_gamma(g0, g1, name=None):
-  """Calculate the batched KL divergence KL(g0 || g1) with g0 and g1 Gamma.
-
-  Args:
-    g0: instance of a Gamma distribution object.
-    g1: instance of a Gamma distribution object.
-    name: (optional) Name to use for created operations.
-      Default is "kl_gamma_gamma".
-
-  Returns:
-    kl_gamma_gamma: `Tensor`. The batchwise KL(g0 || g1).
-  """
-  with ops.name_scope(name, "kl_gamma_gamma", values=[
-      g0.concentration, g0.rate, g1.concentration, g1.rate]):
-    # Result from:
-    #   http://www.fil.ion.ucl.ac.uk/~wpenny/publications/densities.ps
-    # For derivation see:
-    #   http://stats.stackexchange.com/questions/11646/kullback-leibler-divergence-between-two-gamma-distributions   pylint: disable=line-too-long
-    return (((g0.concentration - g1.concentration)
-             * math_ops.digamma(g0.concentration))
-            + math_ops.lgamma(g1.concentration)
-            - math_ops.lgamma(g0.concentration)
-            + g1.concentration * math_ops.log(g0.rate)
-            - g1.concentration * math_ops.log(g1.rate)
-            + g0.concentration * (g1.rate / g0.rate - 1.))
diff --git a/tensorflow/contrib/distributions/python/ops/laplace.py b/tensorflow/contrib/distributions/python/ops/laplace.py
deleted file mode 100644
index 5c964ff78a..0000000000
--- a/tensorflow/contrib/distributions/python/ops/laplace.py
+++ /dev/null
@@ -1,226 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""The Laplace distribution class."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import math
-
-import numpy as np
-
-from tensorflow.python.framework import constant_op
-from tensorflow.python.framework import dtypes
-from tensorflow.python.framework import ops
-from tensorflow.python.framework import tensor_shape
-from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import check_ops
-from tensorflow.python.ops import math_ops
-from tensorflow.python.ops import nn
-from tensorflow.python.ops import random_ops
-from tensorflow.python.ops.distributions import distribution
-from tensorflow.python.ops.distributions import special_math
-
-
-__all__ = [
-    "Laplace",
-    "LaplaceWithSoftplusScale",
-]
-
-
-class Laplace(distribution.Distribution):
-  """The Laplace distribution with location `loc` and `scale` parameters.
-
-  #### Mathematical details
-
-  The probability density function (pdf) of this distribution is,
-
-  ```none
-  pdf(x; mu, sigma) = exp(-|x - mu| / sigma) / Z
-  Z = 2 sigma
-  ```
-
-  where `loc = mu`, `scale = sigma`, and `Z` is the normalization constant.
-
-  Note that the Laplace distribution can be thought of two exponential
-  distributions spliced together "back-to-back."
-
-  The Lpalce distribution is a member of the [location-scale family](
-  https://en.wikipedia.org/wiki/Location-scale_family), i.e., it can be
-  constructed as,
-
-  ```none
-  X ~ Laplace(loc=0, scale=1)
-  Y = loc + scale * X
-  ```
-
-  """
-
-  def __init__(self,
-               loc,
-               scale,
-               validate_args=False,
-               allow_nan_stats=True,
-               name="Laplace"):
-    """Construct Laplace distribution with parameters `loc` and `scale`.
-
-    The parameters `loc` and `scale` must be shaped in a way that supports
-    broadcasting (e.g., `loc / scale` is a valid operation).
-
-    Args:
-      loc: Floating point tensor which characterizes the location (center)
-        of the distribution.
-      scale: Positive floating point tensor which characterizes the spread of
-        the distribution.
-      validate_args: Python `bool`, default `False`. When `True` distribution
-        parameters are checked for validity despite possibly degrading runtime
-        performance. When `False` invalid inputs may silently render incorrect
-        outputs.
-      allow_nan_stats: Python `bool`, default `True`. When `True`,
-        statistics (e.g., mean, mode, variance) use the value "`NaN`" to
-        indicate the result is undefined. When `False`, an exception is raised
-        if one or more of the statistic's batch members are undefined.
-      name: Python `str` name prefixed to Ops created by this class.
-
-    Raises:
-      TypeError: if `loc` and `scale` are of different dtype.
-    """
-    parameters = locals()
-    with ops.name_scope(name, values=[loc, scale]):
-      with ops.control_dependencies([check_ops.assert_positive(scale)] if
-                                    validate_args else []):
-        self._loc = array_ops.identity(loc, name="loc")
-        self._scale = array_ops.identity(scale, name="scale")
-        check_ops.assert_same_float_dtype([self._loc, self._scale])
-      super(Laplace, self).__init__(
-          dtype=self._loc.dtype,
-          reparameterization_type=distribution.FULLY_REPARAMETERIZED,
-          validate_args=validate_args,
-          allow_nan_stats=allow_nan_stats,
-          parameters=parameters,
-          graph_parents=[self._loc, self._scale],
-          name=name)
-
-  @staticmethod
-  def _param_shapes(sample_shape):
-    return dict(
-        zip(("loc", "scale"), ([ops.convert_to_tensor(
-            sample_shape, dtype=dtypes.int32)] * 2)))
-
-  @property
-  def loc(self):
-    """Distribution parameter for the location."""
-    return self._loc
-
-  @property
-  def scale(self):
-    """Distribution parameter for scale."""
-    return self._scale
-
-  def _batch_shape_tensor(self):
-    return array_ops.broadcast_dynamic_shape(
-        array_ops.shape(self.loc), array_ops.shape(self.scale))
-
-  def _batch_shape(self):
-    return array_ops.broadcast_static_shape(
-        self.loc.get_shape(), self.scale.get_shape())
-
-  def _event_shape_tensor(self):
-    return constant_op.constant([], dtype=dtypes.int32)
-
-  def _event_shape(self):
-    return tensor_shape.scalar()
-
-  def _sample_n(self, n, seed=None):
-    shape = array_ops.concat([[n], self.batch_shape_tensor()], 0)
-    # Uniform variates must be sampled from the open-interval `(-1, 1)` rather
-    # than `[-1, 1)`. In the case of `(0, 1)` we'd use
-    # `np.finfo(self.dtype.as_numpy_dtype).tiny` because it is the smallest,
-    # positive, "normal" number. However, the concept of subnormality exists
-    # only at zero; here we need the smallest usable number larger than -1,
-    # i.e., `-1 + eps/2`.
-    uniform_samples = random_ops.random_uniform(
-        shape=shape,
-        minval=np.nextafter(self.dtype.as_numpy_dtype(-1.),
-                            self.dtype.as_numpy_dtype(0.)),
-        maxval=1.,
-        dtype=self.dtype,
-        seed=seed)
-    return (self.loc - self.scale * math_ops.sign(uniform_samples) *
-            math_ops.log1p(-math_ops.abs(uniform_samples)))
-
-  def _log_prob(self, x):
-    return self._log_unnormalized_prob(x) - self._log_normalization()
-
-  def _prob(self, x):
-    return math_ops.exp(self._log_prob(x))
-
-  def _log_cdf(self, x):
-    return special_math.log_cdf_laplace(self._z(x))
-
-  def _log_survival_function(self, x):
-    return special_math.log_cdf_laplace(-self._z(x))
-
-  def _cdf(self, x):
-    z = self._z(x)
-    return (0.5 + 0.5 * math_ops.sign(z) *
-            (1. - math_ops.exp(-math_ops.abs(z))))
-
-  def _log_unnormalized_prob(self, x):
-    return -math_ops.abs(self._z(x))
-
-  def _log_normalization(self):
-    return math.log(2.) + math_ops.log(self.scale)
-
-  def _entropy(self):
-    # Use broadcasting rules to calculate the full broadcast scale.
-    scale = self.scale + array_ops.zeros_like(self.loc)
-    return math.log(2.) + 1. + math_ops.log(scale)
-
-  def _mean(self):
-    return self.loc + array_ops.zeros_like(self.scale)
-
-  def _stddev(self):
-    return math.sqrt(2.) * self.scale + array_ops.zeros_like(self.loc)
-
-  def _median(self):
-    return self._mean()
-
-  def _mode(self):
-    return self._mean()
-
-  def _z(self, x):
-    return (x - self.loc) / self.scale
-
-
-class LaplaceWithSoftplusScale(Laplace):
-  """Laplace with softplus applied to `scale`."""
-
-  def __init__(self,
-               loc,
-               scale,
-               validate_args=False,
-               allow_nan_stats=True,
-               name="LaplaceWithSoftplusScale"):
-    parameters = locals()
-    with ops.name_scope(name, values=[loc, scale]):
-      super(LaplaceWithSoftplusScale, self).__init__(
-          loc=loc,
-          scale=nn.softplus(scale, name="softplus_scale"),
-          validate_args=validate_args,
-          allow_nan_stats=allow_nan_stats,
-          name=name)
-    self._parameters = parameters
diff --git a/tensorflow/contrib/distributions/python/ops/mixture.py b/tensorflow/contrib/distributions/python/ops/mixture.py
index 6d318014ad..f3b09f60f3 100644
--- a/tensorflow/contrib/distributions/python/ops/mixture.py
+++ b/tensorflow/contrib/distributions/python/ops/mixture.py
@@ -20,7 +20,6 @@ from __future__ import print_function
 
 import numpy as np
 
-from tensorflow.contrib.distributions.python.ops import categorical
 from tensorflow.python.framework import ops
 from tensorflow.python.framework import tensor_shape
 from tensorflow.python.framework import tensor_util
@@ -29,6 +28,7 @@ from tensorflow.python.ops import check_ops
 from tensorflow.python.ops import data_flow_ops
 from tensorflow.python.ops import math_ops
 from tensorflow.python.ops import nn_ops
+from tensorflow.python.ops.distributions import categorical
 from tensorflow.python.ops.distributions import distribution
 from tensorflow.python.ops.distributions import util as distribution_util
 
diff --git a/tensorflow/contrib/distributions/python/ops/multinomial.py b/tensorflow/contrib/distributions/python/ops/multinomial.py
deleted file mode 100644
index a5bea7b4ba..0000000000
--- a/tensorflow/contrib/distributions/python/ops/multinomial.py
+++ /dev/null
@@ -1,291 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""The Multinomial distribution class."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from tensorflow.python.framework import dtypes
-from tensorflow.python.framework import ops
-from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import check_ops
-from tensorflow.python.ops import control_flow_ops
-from tensorflow.python.ops import math_ops
-from tensorflow.python.ops import random_ops
-from tensorflow.python.ops.distributions import distribution
-from tensorflow.python.ops.distributions import util as distribution_util
-
-
-__all__ = [
-    "Multinomial",
-]
-
-
-_multinomial_sample_note = """For each batch of counts, `value = [n_0, ...
-,n_{k-1}]`, `P[value]` is the probability that after sampling `self.total_count`
-draws from this Multinomial distribution, the number of draws falling in class
-`j` is `n_j`. Since this definition is [exchangeable](
-https://en.wikipedia.org/wiki/Exchangeable_random_variables); different
-sequences have the same counts so the probability includes a combinatorial
-coefficient.
-
-Note: `value` must be a non-negative tensor with dtype `self.dtype`, have no
-fractional components, and such that
-`tf.reduce_sum(value, -1) = self.total_count`. Its shape must be broadcastable
-with `self.probs` and `self.total_count`."""
-
-
-class Multinomial(distribution.Distribution):
-  """Multinomial distribution.
-
-  This Multinomial distribution is parameterized by `probs`, a (batch of)
-  length-`k` `prob` (probability) vectors (`k > 1`) such that
-  `tf.reduce_sum(probs, -1) = 1`, and a `total_count` number of trials, i.e.,
-  the number of trials per draw from the Multinomial. It is defined over a
-  (batch of) length-`k` vector `counts` such that
-  `tf.reduce_sum(counts, -1) = total_count`. The Multinomial is identically the
-  Binomial distribution when `k = 2`.
-
-  #### Mathematical Details
-
-  The Multinomial is a distribution over `k`-class counts, i.e., a length-`k`
-  vector of non-negative integer `counts = n = [n_0, ..., n_{k-1}]`.
-
-  The probability mass function (pmf) is,
-
-  ```none
-  pmf(n; pi, N) = prod_j (pi_j)**n_j / Z
-  Z = (prod_j n_j!) / N!
-  ```
-
-  where:
-  * `probs = pi = [pi_0, ..., pi_{k-1}]`, `pi_j > 0`, `sum_j pi_j = 1`,
-  * `total_count = N`, `N` a positive integer,
-  * `Z` is the normalization constant, and,
-  * `N!` denotes `N` factorial.
-
-  Distribution parameters are automatically broadcast in all functions; see
-  examples for details.
-
-  #### Examples
-
-  Create a 3-class distribution, with the 3rd class is most likely to be drawn,
-  using logits.
-
-  ```python
-  logits = [-50., -43, 0]
-  dist = Multinomial(total_count=4., logits=logits)
-  ```
-
-  Create a 3-class distribution, with the 3rd class is most likely to be drawn.
-
-  ```python
-  p = [.2, .3, .5]
-  dist = Multinomial(total_count=4., probs=p)
-  ```
-
-  The distribution functions can be evaluated on counts.
-
-  ```python
-  # counts same shape as p.
-  counts = [1., 0, 3]
-  dist.prob(counts)  # Shape []
-
-  # p will be broadcast to [[.2, .3, .5], [.2, .3, .5]] to match counts.
-  counts = [[1., 2, 1], [2, 2, 0]]
-  dist.prob(counts)  # Shape [2]
-
-  # p will be broadcast to shape [5, 7, 3] to match counts.
-  counts = [[...]]  # Shape [5, 7, 3]
-  dist.prob(counts)  # Shape [5, 7]
-  ```
-
-  Create a 2-batch of 3-class distributions.
-
-  ```python
-  p = [[.1, .2, .7], [.3, .3, .4]]  # Shape [2, 3]
-  dist = Multinomial(total_count=[4., 5], probs=p)
-
-  counts = [[2., 1, 1], [3, 1, 1]]
-  dist.prob(counts)  # Shape [2]
-  ```
-  """
-
-  def __init__(self,
-               total_count,
-               logits=None,
-               probs=None,
-               validate_args=False,
-               allow_nan_stats=True,
-               name="Multinomial"):
-    """Initialize a batch of Multinomial distributions.
-
-    Args:
-      total_count: Non-negative floating point tensor with shape broadcastable
-        to `[N1,..., Nm]` with `m >= 0`. Defines this as a batch of
-        `N1 x ... x Nm` different Multinomial distributions. Its components
-        should be equal to integer values.
-      logits: Floating point tensor representing the log-odds of a
-        positive event with shape broadcastable to `[N1,..., Nm, k], m >= 0`,
-        and the same dtype as `total_count`. Defines this as a batch of
-        `N1 x ... x Nm` different `k` class Multinomial distributions. Only one
-        of `logits` or `probs` should be passed in.
-      probs: Positive floating point tensor with shape broadcastable to
-        `[N1,..., Nm, k]` `m >= 0` and same dtype as `total_count`. Defines
-        this as a batch of `N1 x ... x Nm` different `k` class Multinomial
-        distributions. `probs`'s components in the last portion of its shape
-        should sum to `1`. Only one of `logits` or `probs` should be passed in.
-      validate_args: Python `bool`, default `False`. When `True` distribution
-        parameters are checked for validity despite possibly degrading runtime
-        performance. When `False` invalid inputs may silently render incorrect
-        outputs.
-      allow_nan_stats: Python `bool`, default `True`. When `True`, statistics
-        (e.g., mean, mode, variance) use the value "`NaN`" to indicate the
-        result is undefined. When `False`, an exception is raised if one or
-        more of the statistic's batch members are undefined.
-      name: Python `str` name prefixed to Ops created by this class.
-    """
-    parameters = locals()
-    with ops.name_scope(name, values=[total_count, logits, probs]):
-      self._total_count = self._maybe_assert_valid_total_count(
-          ops.convert_to_tensor(total_count, name="total_count"),
-          validate_args)
-      self._logits, self._probs = distribution_util.get_logits_and_probs(
-          logits=logits,
-          probs=probs,
-          multidimensional=True,
-          validate_args=validate_args,
-          name=name)
-      self._mean_val = self._total_count[..., array_ops.newaxis] * self._probs
-    super(Multinomial, self).__init__(
-        dtype=self._probs.dtype,
-        reparameterization_type=distribution.NOT_REPARAMETERIZED,
-        validate_args=validate_args,
-        allow_nan_stats=allow_nan_stats,
-        parameters=parameters,
-        graph_parents=[self._total_count,
-                       self._logits,
-                       self._probs],
-        name=name)
-
-  @property
-  def total_count(self):
-    """Number of trials used to construct a sample."""
-    return self._total_count
-
-  @property
-  def logits(self):
-    """Vector of coordinatewise logits."""
-    return self._logits
-
-  @property
-  def probs(self):
-    """Probability of of drawing a `1` in that coordinate."""
-    return self._probs
-
-  def _batch_shape_tensor(self):
-    return array_ops.shape(self._mean_val)[:-1]
-
-  def _batch_shape(self):
-    return self._mean_val.get_shape().with_rank_at_least(1)[:-1]
-
-  def _event_shape_tensor(self):
-    return array_ops.shape(self._mean_val)[-1:]
-
-  def _event_shape(self):
-    return self._mean_val.get_shape().with_rank_at_least(1)[-1:]
-
-  def _sample_n(self, n, seed=None):
-    n_draws = math_ops.cast(self.total_count, dtype=dtypes.int32)
-    if self.total_count.get_shape().ndims is not None:
-      if self.total_count.get_shape().ndims != 0:
-        raise NotImplementedError(
-            "Sample only supported for scalar number of draws.")
-    elif self.validate_args:
-      is_scalar = check_ops.assert_rank(
-          n_draws, 0,
-          message="Sample only supported for scalar number of draws.")
-      n_draws = control_flow_ops.with_dependencies([is_scalar], n_draws)
-    k = self.event_shape_tensor()[0]
-    # Flatten batch dims so logits has shape [B, k],
-    # where B = reduce_prod(self.batch_shape_tensor()).
-    draws = random_ops.multinomial(
-        logits=array_ops.reshape(self.logits, [-1, k]),
-        num_samples=n * n_draws,
-        seed=seed)
-    draws = array_ops.reshape(draws, shape=[-1, n, n_draws])
-    x = math_ops.reduce_sum(array_ops.one_hot(draws, depth=k),
-                            axis=-2)  # shape: [B, n, k]
-    x = array_ops.transpose(x, perm=[1, 0, 2])
-    final_shape = array_ops.concat([[n], self.batch_shape_tensor(), [k]], 0)
-    return array_ops.reshape(x, final_shape)
-
-  @distribution_util.AppendDocstring(_multinomial_sample_note)
-  def _log_prob(self, counts):
-    return self._log_unnormalized_prob(counts) - self._log_normalization(counts)
-
-  @distribution_util.AppendDocstring(_multinomial_sample_note)
-  def _prob(self, counts):
-    return math_ops.exp(self._log_prob(counts))
-
-  def _log_unnormalized_prob(self, counts):
-    counts = self._maybe_assert_valid_sample(counts)
-    return math_ops.reduce_sum(counts * math_ops.log(self.probs), -1)
-
-  def _log_normalization(self, counts):
-    counts = self._maybe_assert_valid_sample(counts)
-    return -distribution_util.log_combinations(self.total_count, counts)
-
-  def _mean(self):
-    return array_ops.identity(self._mean_val)
-
-  def _covariance(self):
-    p = self.probs * array_ops.ones_like(
-        self.total_count)[..., array_ops.newaxis]
-    return array_ops.matrix_set_diag(
-        -math_ops.matmul(self._mean_val[..., array_ops.newaxis],
-                         p[..., array_ops.newaxis, :]),  # outer product
-        self._variance())
-
-  def _variance(self):
-    p = self.probs * array_ops.ones_like(
-        self.total_count)[..., array_ops.newaxis]
-    return self._mean_val - self._mean_val * p
-
-  def _maybe_assert_valid_total_count(self, total_count, validate_args):
-    if not validate_args:
-      return total_count
-    return control_flow_ops.with_dependencies([
-        check_ops.assert_non_negative(
-            total_count,
-            message="total_count must be non-negative."),
-        distribution_util.assert_integer_form(
-            total_count,
-            message="total_count cannot contain fractional values."),
-    ], total_count)
-
-  def _maybe_assert_valid_sample(self, counts):
-    """Check counts for proper shape, values, then return tensor version."""
-    if not self.validate_args:
-      return counts
-
-    counts = distribution_util.embed_check_nonnegative_discrete(
-        counts, check_integer=True)
-    return control_flow_ops.with_dependencies([
-        check_ops.assert_equal(
-            self.total_count, math_ops.reduce_sum(counts, -1),
-            message="counts must sum to `self.total_count`"),
-    ], counts)
diff --git a/tensorflow/contrib/distributions/python/ops/student_t.py b/tensorflow/contrib/distributions/python/ops/student_t.py
deleted file mode 100644
index 7872569a2b..0000000000
--- a/tensorflow/contrib/distributions/python/ops/student_t.py
+++ /dev/null
@@ -1,360 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Student's t distribution class."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-
-from tensorflow.python.framework import constant_op
-from tensorflow.python.framework import dtypes
-from tensorflow.python.framework import ops
-from tensorflow.python.framework import tensor_shape
-from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import check_ops
-from tensorflow.python.ops import control_flow_ops
-from tensorflow.python.ops import math_ops
-from tensorflow.python.ops import nn
-from tensorflow.python.ops import random_ops
-from tensorflow.python.ops import special_math_ops
-from tensorflow.python.ops.distributions import distribution
-from tensorflow.python.ops.distributions import util as distribution_util
-
-
-__all__ = [
-    "StudentT",
-    "StudentTWithAbsDfSoftplusScale",
-]
-
-
-class StudentT(distribution.Distribution):
-  # pylint: disable=line-too-long
-  """Student's t-distribution with degree of freedom `df`, location `loc`, and `scale` parameters.
-
-  #### Mathematical details
-
-  The probability density function (pdf) is,
-
-  ```none
-  pdf(x; df, mu, sigma) = (1 + y**2 / df)**(-0.5 (df + 1)) / Z
-  where,
-  y = (x - mu) / sigma
-  Z = abs(sigma) sqrt(df pi) Gamma(0.5 df) / Gamma(0.5 (df + 1))
-  ```
-
-  where:
-  * `loc = mu`,
-  * `scale = sigma`, and,
-  * `Z` is the normalization constant, and,
-  * `Gamma` is the [gamma function](
-    https://en.wikipedia.org/wiki/Gamma_function).
-
-  The StudentT distribution is a member of the [location-scale family](
-  https://en.wikipedia.org/wiki/Location-scale_family), i.e., it can be
-  constructed as,
-
-  ```none
-  X ~ StudentT(df, loc=0, scale=1)
-  Y = loc + scale * X
-  ```
-
-  Notice that `scale` has semantics more similar to standard deviation than
-  variance. However it is not actually the std. deviation; the Student's
-  t-distribution std. dev. is `scale sqrt(df / (df - 2))` when `df > 2`.
-
-  #### Examples
-
-  Examples of initialization of one or a batch of distributions.
-
-  ```python
-  # Define a single scalar Student t distribution.
-  single_dist = tf.contrib.distributions.StudentT(df=3)
-
-  # Evaluate the pdf at 1, returning a scalar Tensor.
-  single_dist.prob(1.)
-
-  # Define a batch of two scalar valued Student t's.
-  # The first has degrees of freedom 2, mean 1, and scale 11.
-  # The second 3, 2 and 22.
-  multi_dist = tf.contrib.distributions.StudentT(df=[2, 3],
-                                                 loc=[1, 2.],
-                                                 scale=[11, 22.])
-
-  # Evaluate the pdf of the first distribution on 0, and the second on 1.5,
-  # returning a length two tensor.
-  multi_dist.prob([0, 1.5])
-
-  # Get 3 samples, returning a 3 x 2 tensor.
-  multi_dist.sample(3)
-  ```
-
-  Arguments are broadcast when possible.
-
-  ```python
-  # Define a batch of two Student's t distributions.
-  # Both have df 2 and mean 1, but different scales.
-  dist = tf.contrib.distributions.StudentT(df=2, loc=1, scale=[11, 22.])
-
-  # Evaluate the pdf of both distributions on the same point, 3.0,
-  # returning a length 2 tensor.
-  dist.prob(3.0)
-  ```
-
-  """
-  # pylint: enable=line-too-long
-
-  def __init__(self,
-               df,
-               loc,
-               scale,
-               validate_args=False,
-               allow_nan_stats=True,
-               name="StudentT"):
-    """Construct Student's t distributions.
-
-    The distributions have degree of freedom `df`, mean `loc`, and scale
-    `scale`.
-
-    The parameters `df`, `loc`, and `scale` must be shaped in a way that
-    supports broadcasting (e.g. `df + loc + scale` is a valid operation).
-
-    Args:
-      df: Floating-point `Tensor`. The degrees of freedom of the
-        distribution(s). `df` must contain only positive values.
-      loc: Floating-point `Tensor`. The mean(s) of the distribution(s).
-      scale: Floating-point `Tensor`. The scaling factor(s) for the
-        distribution(s). Note that `scale` is not technically the standard
-        deviation of this distribution but has semantics more similar to
-        standard deviation than variance.
-      validate_args: Python `bool`, default `False`. When `True` distribution
-        parameters are checked for validity despite possibly degrading runtime
-        performance. When `False` invalid inputs may silently render incorrect
-        outputs.
-      allow_nan_stats: Python `bool`, default `True`. When `True`,
-        statistics (e.g., mean, mode, variance) use the value "`NaN`" to
-        indicate the result is undefined. When `False`, an exception is raised
-        if one or more of the statistic's batch members are undefined.
-      name: Python `str` name prefixed to Ops created by this class.
-
-    Raises:
-      TypeError: if loc and scale are different dtypes.
-    """
-    parameters = locals()
-    with ops.name_scope(name, values=[df, loc, scale]):
-      with ops.control_dependencies([check_ops.assert_positive(df)]
-                                    if validate_args else []):
-        self._df = array_ops.identity(df, name="df")
-        self._loc = array_ops.identity(loc, name="loc")
-        self._scale = array_ops.identity(scale, name="scale")
-        check_ops.assert_same_float_dtype(
-            (self._df, self._loc, self._scale))
-    super(StudentT, self).__init__(
-        dtype=self._scale.dtype,
-        reparameterization_type=distribution.NOT_REPARAMETERIZED,
-        validate_args=validate_args,
-        allow_nan_stats=allow_nan_stats,
-        parameters=parameters,
-        graph_parents=[self._df, self._loc, self._scale],
-        name=name)
-
-  @staticmethod
-  def _param_shapes(sample_shape):
-    return dict(
-        zip(("df", "loc", "scale"), (
-            [ops.convert_to_tensor(
-                sample_shape, dtype=dtypes.int32)] * 3)))
-
-  @property
-  def df(self):
-    """Degrees of freedom in these Student's t distribution(s)."""
-    return self._df
-
-  @property
-  def loc(self):
-    """Locations of these Student's t distribution(s)."""
-    return self._loc
-
-  @property
-  def scale(self):
-    """Scaling factors of these Student's t distribution(s)."""
-    return self._scale
-
-  def _batch_shape_tensor(self):
-    return array_ops.broadcast_dynamic_shape(
-        array_ops.shape(self.df),
-        array_ops.broadcast_dynamic_shape(
-            array_ops.shape(self.loc), array_ops.shape(self.scale)))
-
-  def _batch_shape(self):
-    return array_ops.broadcast_static_shape(
-        array_ops.broadcast_static_shape(self.df.get_shape(),
-                                         self.loc.get_shape()),
-        self.scale.get_shape())
-
-  def _event_shape_tensor(self):
-    return constant_op.constant([], dtype=math_ops.int32)
-
-  def _event_shape(self):
-    return tensor_shape.scalar()
-
-  def _sample_n(self, n, seed=None):
-    # The sampling method comes from the fact that if:
-    #   X ~ Normal(0, 1)
-    #   Z ~ Chi2(df)
-    #   Y = X / sqrt(Z / df)
-    # then:
-    #   Y ~ StudentT(df).
-    shape = array_ops.concat([[n], self.batch_shape_tensor()], 0)
-    normal_sample = random_ops.random_normal(shape, dtype=self.dtype, seed=seed)
-    df = self.df * array_ops.ones(self.batch_shape_tensor(), dtype=self.dtype)
-    gamma_sample = random_ops.random_gamma(
-        [n],
-        0.5 * df,
-        beta=0.5,
-        dtype=self.dtype,
-        seed=distribution_util.gen_new_seed(seed, salt="student_t"))
-    samples = normal_sample * math_ops.rsqrt(gamma_sample / df)
-    return samples * self.scale + self.loc  # Abs(scale) not wanted.
-
-  def _log_prob(self, x):
-    return self._log_unnormalized_prob(x) - self._log_normalization()
-
-  def _log_unnormalized_prob(self, x):
-    y = (x - self.loc) / self.scale  # Abs(scale) superfluous.
-    return -0.5 * (self.df + 1.) * math_ops.log1p(y**2. / self.df)
-
-  def _log_normalization(self):
-    return (math_ops.log(math_ops.abs(self.scale)) +
-            0.5 * math_ops.log(self.df) +
-            0.5 * np.log(np.pi) +
-            math_ops.lgamma(0.5 * self.df) -
-            math_ops.lgamma(0.5 * (self.df + 1.)))
-
-  def _prob(self, x):
-    return math_ops.exp(self._log_prob(x))
-
-  def _cdf(self, x):
-    # Take Abs(scale) to make subsequent where work correctly.
-    y = (x - self.loc) / math_ops.abs(self.scale)
-    x_t = self.df / (y**2. + self.df)
-    neg_cdf = 0.5 * math_ops.betainc(0.5 * self.df, 0.5, x_t)
-    return array_ops.where(math_ops.less(y, 0.), neg_cdf, 1. - neg_cdf)
-
-  def _entropy(self):
-    v = array_ops.ones(self.batch_shape_tensor(),
-                       dtype=self.dtype)[..., array_ops.newaxis]
-    u = v * self.df[..., array_ops.newaxis]
-    beta_arg = array_ops.concat([u, v], -1) / 2.
-    return (math_ops.log(math_ops.abs(self.scale)) +
-            0.5 * math_ops.log(self.df) +
-            special_math_ops.lbeta(beta_arg) +
-            0.5 * (self.df + 1.) *
-            (math_ops.digamma(0.5 * (self.df + 1.)) -
-             math_ops.digamma(0.5 * self.df)))
-
-  @distribution_util.AppendDocstring(
-      """The mean of Student's T equals `loc` if `df > 1`, otherwise it is
-      `NaN`. If `self.allow_nan_stats=True`, then an exception will be raised
-      rather than returning `NaN`.""")
-  def _mean(self):
-    mean = self.loc * array_ops.ones(self.batch_shape_tensor(),
-                                     dtype=self.dtype)
-    if self.allow_nan_stats:
-      nan = np.array(np.nan, dtype=self.dtype.as_numpy_dtype())
-      return array_ops.where(
-          math_ops.greater(
-              self.df,
-              array_ops.ones(self.batch_shape_tensor(), dtype=self.dtype)),
-          mean,
-          array_ops.fill(self.batch_shape_tensor(), nan, name="nan"))
-    else:
-      return control_flow_ops.with_dependencies(
-          [
-              check_ops.assert_less(
-                  array_ops.ones([], dtype=self.dtype),
-                  self.df,
-                  message="mean not defined for components of df <= 1"),
-          ],
-          mean)
-
-  @distribution_util.AppendDocstring("""
-      The variance for Student's T equals
-
-      ```
-      df / (df - 2), when df > 2
-      infinity, when 1 < df <= 2
-      NaN, when df <= 1
-      ```
-      """)
-  def _variance(self):
-    # We need to put the tf.where inside the outer tf.where to ensure we never
-    # hit a NaN in the gradient.
-    denom = array_ops.where(math_ops.greater(self.df, 2.),
-                            self.df - 2.,
-                            array_ops.ones_like(self.df))
-    # Abs(scale) superfluous.
-    var = (array_ops.ones(self.batch_shape_tensor(), dtype=self.dtype) *
-           math_ops.square(self.scale) * self.df / denom)
-    # When 1 < df <= 2, variance is infinite.
-    inf = np.array(np.inf, dtype=self.dtype.as_numpy_dtype())
-    result_where_defined = array_ops.where(
-        self.df > array_ops.fill(self.batch_shape_tensor(), 2.),
-        var,
-        array_ops.fill(self.batch_shape_tensor(), inf, name="inf"))
-
-    if self.allow_nan_stats:
-      nan = np.array(np.nan, dtype=self.dtype.as_numpy_dtype())
-      return array_ops.where(
-          math_ops.greater(
-              self.df,
-              array_ops.ones(self.batch_shape_tensor(), dtype=self.dtype)),
-          result_where_defined,
-          array_ops.fill(self.batch_shape_tensor(), nan, name="nan"))
-    else:
-      return control_flow_ops.with_dependencies(
-          [
-              check_ops.assert_less(
-                  array_ops.ones([], dtype=self.dtype),
-                  self.df,
-                  message="variance not defined for components of df <= 1"),
-          ],
-          result_where_defined)
-
-  def _mode(self):
-    return array_ops.identity(self.loc)
-
-
-class StudentTWithAbsDfSoftplusScale(StudentT):
-  """StudentT with `df = floor(abs(df))` and `scale = softplus(scale)`."""
-
-  def __init__(self,
-               df,
-               loc,
-               scale,
-               validate_args=False,
-               allow_nan_stats=True,
-               name="StudentTWithAbsDfSoftplusScale"):
-    parameters = locals()
-    with ops.name_scope(name, values=[df, scale]):
-      super(StudentTWithAbsDfSoftplusScale, self).__init__(
-          df=math_ops.floor(math_ops.abs(df)),
-          loc=loc,
-          scale=nn.softplus(scale, name="softplus_scale"),
-          validate_args=validate_args,
-          allow_nan_stats=allow_nan_stats,
-          name=name)
-    self._parameters = parameters
diff --git a/tensorflow/contrib/distributions/python/ops/uniform.py b/tensorflow/contrib/distributions/python/ops/uniform.py
deleted file mode 100644
index 9b555f87ea..0000000000
--- a/tensorflow/contrib/distributions/python/ops/uniform.py
+++ /dev/null
@@ -1,202 +0,0 @@
-# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""The Uniform distribution class."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import math
-
-from tensorflow.python.framework import constant_op
-from tensorflow.python.framework import dtypes
-from tensorflow.python.framework import ops
-from tensorflow.python.framework import tensor_shape
-from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import check_ops
-from tensorflow.python.ops import math_ops
-from tensorflow.python.ops import random_ops
-from tensorflow.python.ops.distributions import distribution
-
-
-class Uniform(distribution.Distribution):
-  """Uniform distribution with `low` and `high` parameters.
-
-  #### Mathematical Details
-
-  The probability density function (pdf) is,
-
-  ```none
-  pdf(x; a, b) = I[a <= x < b] / Z
-  Z = b - a
-  ```
-
-  where:
-  * `low = a`,
-  * `high = b`,
-  * `Z` is the normalizing constant, and,
-  * `I[predicate]` is the [indicator function](
-    https://en.wikipedia.org/wiki/Indicator_function) for `predicate`.
-
-  The parameters `low` and `high` must be shaped in a way that supports
-  broadcasting (e.g., `high - low` is a valid operation).
-
-  #### Examples
-
-  ```python
-  # Without broadcasting:
-  u1 = Uniform(low=3.0, high=4.0)  # a single uniform distribution [3, 4]
-  u2 = Uniform(low=[1.0, 2.0],
-               high=[3.0, 4.0])  # 2 distributions [1, 3], [2, 4]
-  u3 = Uniform(low=[[1.0, 2.0],
-                    [3.0, 4.0]],
-               high=[[1.5, 2.5],
-                     [3.5, 4.5]])  # 4 distributions
-  ```
-
-  ```python
-  # With broadcasting:
-  u1 = Uniform(low=3.0, high=[5.0, 6.0, 7.0])  # 3 distributions
-  ```
-
-  """
-
-  def __init__(self,
-               low=0.,
-               high=1.,
-               validate_args=False,
-               allow_nan_stats=True,
-               name="Uniform"):
-    """Initialize a batch of Uniform distributions.
-
-    Args:
-      low: Floating point tensor, lower boundary of the output interval. Must
-        have `low < high`.
-      high: Floating point tensor, upper boundary of the output interval. Must
-        have `low < high`.
-      validate_args: Python `bool`, default `False`. When `True` distribution
-        parameters are checked for validity despite possibly degrading runtime
-        performance. When `False` invalid inputs may silently render incorrect
-        outputs.
-      allow_nan_stats: Python `bool`, default `True`. When `True`, statistics
-        (e.g., mean, mode, variance) use the value "`NaN`" to indicate the
-        result is undefined. When `False`, an exception is raised if one or
-        more of the statistic's batch members are undefined.
-      name: Python `str` name prefixed to Ops created by this class.
-
-    Raises:
-      InvalidArgumentError: if `low >= high` and `validate_args=False`.
-    """
-    parameters = locals()
-    with ops.name_scope(name, values=[low, high]):
-      with ops.control_dependencies([
-          check_ops.assert_less(
-              low, high, message="uniform not defined when low >= high.")
-      ] if validate_args else []):
-        self._low = array_ops.identity(low, name="low")
-        self._high = array_ops.identity(high, name="high")
-        check_ops.assert_same_float_dtype([self._low, self._high])
-    super(Uniform, self).__init__(
-        dtype=self._low.dtype,
-        reparameterization_type=distribution.FULLY_REPARAMETERIZED,
-        validate_args=validate_args,
-        allow_nan_stats=allow_nan_stats,
-        parameters=parameters,
-        graph_parents=[self._low,
-                       self._high],
-        name=name)
-
-  @staticmethod
-  def _param_shapes(sample_shape):
-    return dict(
-        zip(("low", "high"),
-            ([ops.convert_to_tensor(sample_shape, dtype=dtypes.int32)] * 2)))
-
-  @property
-  def low(self):
-    """Lower boundary of the output interval."""
-    return self._low
-
-  @property
-  def high(self):
-    """Upper boundary of the output interval."""
-    return self._high
-
-  def range(self, name="range"):
-    """`high - low`."""
-    with self._name_scope(name):
-      return self.high - self.low
-
-  def _batch_shape_tensor(self):
-    return array_ops.broadcast_dynamic_shape(
-        array_ops.shape(self.low),
-        array_ops.shape(self.high))
-
-  def _batch_shape(self):
-    return array_ops.broadcast_static_shape(
-        self.low.get_shape(),
-        self.high.get_shape())
-
-  def _event_shape_tensor(self):
-    return constant_op.constant([], dtype=dtypes.int32)
-
-  def _event_shape(self):
-    return tensor_shape.scalar()
-
-  def _sample_n(self, n, seed=None):
-    shape = array_ops.concat([[n], self.batch_shape_tensor()], 0)
-    samples = random_ops.random_uniform(shape=shape,
-                                        dtype=self.dtype,
-                                        seed=seed)
-    return self.low + self.range() * samples
-
-  def _log_prob(self, x):
-    return math_ops.log(self._prob(x))
-
-  def _prob(self, x):
-    broadcasted_x = x * array_ops.ones(self.batch_shape_tensor())
-    return array_ops.where(
-        math_ops.is_nan(broadcasted_x),
-        broadcasted_x,
-        array_ops.where(
-            math_ops.logical_or(broadcasted_x < self.low,
-                                broadcasted_x >= self.high),
-            array_ops.zeros_like(broadcasted_x),
-            array_ops.ones_like(broadcasted_x) / self.range()))
-
-  def _log_cdf(self, x):
-    return math_ops.log(self.cdf(x))
-
-  def _cdf(self, x):
-    broadcast_shape = array_ops.broadcast_dynamic_shape(
-        array_ops.shape(x), self.batch_shape_tensor())
-    zeros = array_ops.zeros(broadcast_shape, dtype=self.dtype)
-    ones = array_ops.ones(broadcast_shape, dtype=self.dtype)
-    broadcasted_x = x * ones
-    result_if_not_big = array_ops.where(
-        x < self.low, zeros, (broadcasted_x - self.low) / self.range())
-    return array_ops.where(x >= self.high, ones, result_if_not_big)
-
-  def _entropy(self):
-    return math_ops.log(self.range())
-
-  def _mean(self):
-    return (self.low + self.high) / 2.
-
-  def _variance(self):
-    return math_ops.square(self.range()) / 12.
-
-  def _stddev(self):
-    return self.range() / math.sqrt(12.)
diff --git a/tensorflow/contrib/distributions/python/ops/vector_student_t.py b/tensorflow/contrib/distributions/python/ops/vector_student_t.py
index d7115f6f0b..299ff36962 100644
--- a/tensorflow/contrib/distributions/python/ops/vector_student_t.py
+++ b/tensorflow/contrib/distributions/python/ops/vector_student_t.py
@@ -19,13 +19,13 @@ from __future__ import division
 from __future__ import print_function
 
 from tensorflow.contrib.distributions.python.ops import bijectors
-from tensorflow.contrib.distributions.python.ops import student_t
 from tensorflow.contrib.distributions.python.ops import transformed_distribution
 from tensorflow.python.framework import constant_op
 from tensorflow.python.framework import dtypes
 from tensorflow.python.framework import ops
 from tensorflow.python.framework import tensor_shape
 from tensorflow.python.ops import array_ops
+from tensorflow.python.ops.distributions import student_t
 from tensorflow.python.ops.distributions import util as distribution_util
 
 
diff --git a/tensorflow/contrib/learn/python/learn/estimators/head.py b/tensorflow/contrib/learn/python/learn/estimators/head.py
index 452f8a901e..15e457f932 100644
--- a/tensorflow/contrib/learn/python/learn/estimators/head.py
+++ b/tensorflow/contrib/learn/python/learn/estimators/head.py
@@ -921,12 +921,21 @@ def _softmax_cross_entropy_loss(labels, logits, weights=None):
     if not labels.dtype.is_integer:
       raise ValueError("Labels dtype should be integer "
                        "Instead got %s." % labels.dtype)
-    # TODO(ptucker): This will break for dynamic shapes.
+
     # sparse_softmax_cross_entropy_with_logits requires [batch_size] labels.
+    is_squeezed_labels = False
+    # TODO(ptucker): This will break for dynamic shapes.
     if len(labels.get_shape()) == 2:
       labels = array_ops.squeeze(labels, squeeze_dims=(1,))
+      is_squeezed_labels = True
+
     loss = nn.sparse_softmax_cross_entropy_with_logits(
         labels=labels, logits=logits, name=name)
+
+    # Restore squeezed dimension, if necessary, so loss matches weights shape.
+    if is_squeezed_labels:
+      loss = array_ops.expand_dims(loss, axis=(1,))
+
     return _compute_weighted_loss(loss, weights)
 
 
diff --git a/tensorflow/contrib/learn/python/learn/estimators/head_test.py b/tensorflow/contrib/learn/python/learn/estimators/head_test.py
index 442530cb5e..207a189a94 100644
--- a/tensorflow/contrib/learn/python/learn/estimators/head_test.py
+++ b/tensorflow/contrib/learn/python/learn/estimators/head_test.py
@@ -791,7 +791,7 @@ class BinaryClassificationHeadTest(test.TestCase):
             [b"0", b"1"], predicted_classes[0])
         self.assertIn("probabilities", six.iterkeys(predictions_for_serving))
 
-  def testBinaryClassificationInferMode_withWightColumn(self):
+  def testBinaryClassificationInferMode_withWeightColumn(self):
     n_classes = 2
     head = head_lib.multi_class_head(n_classes=n_classes,
                                      weight_column_name="label_weight")
@@ -951,7 +951,7 @@ class MultiClassHeadTest(test.TestCase):
 
   def setUp(self):
     self._logits = ((1., 0., 0.),)
-    self._labels = (2,)
+    self._labels = ((2,),)
 
   def _expected_eval_metrics(self, expected_loss):
     return {
@@ -1131,7 +1131,7 @@ class MultiClassHeadTest(test.TestCase):
       _assert_metrics(self, expected_loss,
                       expected_eval_metrics, model_fn_ops)
 
-  def testMultiClassWithWeight(self):
+  def testMultiClassWithScalarWeight(self):
     n_classes = 3
     head = head_lib.multi_class_head(
         n_classes=n_classes,
@@ -1154,6 +1154,30 @@ class MultiClassHeadTest(test.TestCase):
       _assert_metrics(self, expected_loss * weight,
                       self._expected_eval_metrics(expected_loss), model_fn_ops)
 
+  def testMultiClassWith2DWeight(self):
+    n_classes = 3
+    head = head_lib.multi_class_head(
+        n_classes=n_classes,
+        weight_column_name="label_weight",
+        metric_class_ids=range(n_classes))
+    with ops.Graph().as_default(), session.Session():
+      weight = .1
+      weights = ((weight,),)
+      # logloss: z:label, x:logit
+      # z * -log(sigmoid(x)) + (1 - z) * -log(1 - sigmoid(x))
+      model_fn_ops = head.create_model_fn_ops(
+          features={"label_weight": weights},
+          labels=self._labels,
+          mode=model_fn.ModeKeys.TRAIN,
+          train_op_fn=head_lib.no_op_train_fn,
+          logits=self._logits)
+      self._assert_output_alternatives(model_fn_ops)
+      _assert_no_variables(self)
+      _assert_summary_tags(self, ["loss"])
+      expected_loss = 1.5514447
+      _assert_metrics(self, expected_loss * weight,
+                      self._expected_eval_metrics(expected_loss), model_fn_ops)
+
   def testMultiClassWithCustomLoss(self):
     n_classes = 3
     head = head_lib.multi_class_head(
diff --git a/tensorflow/contrib/learn/python/learn/estimators/run_config.py b/tensorflow/contrib/learn/python/learn/estimators/run_config.py
index 109c8d25e1..5a63ee7fa8 100644
--- a/tensorflow/contrib/learn/python/learn/estimators/run_config.py
+++ b/tensorflow/contrib/learn/python/learn/estimators/run_config.py
@@ -31,6 +31,17 @@ from tensorflow.python.estimator import run_config as core_run_config
 from tensorflow.python.training import server_lib
 
 
+_DEFAULT_UID_WHITE_LIST = [
+    'tf_random_seed',
+    'save_summary_steps',
+    'save_checkpoints_steps',
+    'save_checkpoints_secs',
+    'session_config',
+    'keep_checkpoint_max',
+    'keep_checkpoint_every_n_hours',
+]
+
+
 class Environment(object):
   # For running general distributed training.
   CLOUD = 'cloud'
@@ -312,18 +323,29 @@ class RunConfig(ClusterConfig, core_run_config.RunConfig):
     return new_copy
 
   @experimental
-  def uid(self):
+  def uid(self, whitelist=None):
     """Generates a 'Unique Identifier' based on all internal fields.
 
     Caller should use the uid string to check `RunConfig` instance integrity
     in one session use, but should not rely on the implementation details, which
     is subject to change.
 
+    Args:
+      whitelist: A list of the string names of the properties uid should not
+        include. If `None`, defaults to `_DEFAULT_UID_WHITE_LIST`, which
+        includes most properites user allowes to change.
+
     Returns:
       A uid string.
     """
-    # TODO(b/33295821): Allows user to specify a whitelist.
+    if whitelist is None:
+      whitelist = _DEFAULT_UID_WHITE_LIST
+
     state = {k: v for k, v in self.__dict__.items() if not k.startswith('__')}
+    # Pop out the keys in whitelist.
+    for k in whitelist:
+      state.pop('_' + k, None)
+
     ordered_state = collections.OrderedDict(
         sorted(state.items(), key=lambda t: t[0]))
     # For class instance without __repr__, some special cares are required.
diff --git a/tensorflow/contrib/learn/python/learn/estimators/run_config_test.py b/tensorflow/contrib/learn/python/learn/estimators/run_config_test.py
index 14cef7cc43..6d39a9ad13 100644
--- a/tensorflow/contrib/learn/python/learn/estimators/run_config_test.py
+++ b/tensorflow/contrib/learn/python/learn/estimators/run_config_test.py
@@ -257,6 +257,51 @@ class RunConfigTest(test.TestCase):
     self.assertNotEqual(expected_uid, new_config.uid())
     self.assertEqual(ANOTHER_TEST_DIR, new_config.model_dir)
 
+  def test_uid_for_whitelist(self):
+    whitelist = ["model_dir"]
+    config = run_config_lib.RunConfig(
+        tf_random_seed=RANDOM_SEED, model_dir=TEST_DIR)
+
+    expected_uid = config.uid(whitelist)
+    self.assertEqual(expected_uid, config.uid(whitelist))
+
+    new_config = config.replace(model_dir=ANOTHER_TEST_DIR)
+    self.assertEqual(TEST_DIR, config.model_dir)
+    self.assertEqual(expected_uid, new_config.uid(whitelist))
+    self.assertEqual(ANOTHER_TEST_DIR, new_config.model_dir)
+
+  def test_uid_for_default_whitelist(self):
+    config = run_config_lib.RunConfig(
+        tf_random_seed=11,
+        save_summary_steps=12,
+        save_checkpoints_steps=13,
+        save_checkpoints_secs=14,
+        session_config=15,
+        keep_checkpoint_max=16,
+        keep_checkpoint_every_n_hours=17)
+    self.assertEqual(11, config.tf_random_seed)
+    self.assertEqual(12, config.save_summary_steps)
+    self.assertEqual(13, config.save_checkpoints_steps)
+    self.assertEqual(14, config.save_checkpoints_secs)
+    self.assertEqual(15, config.session_config)
+    self.assertEqual(16, config.keep_checkpoint_max)
+    self.assertEqual(17, config.keep_checkpoint_every_n_hours)
+
+    new_config = run_config_lib.RunConfig(
+        tf_random_seed=21,
+        save_summary_steps=22,
+        save_checkpoints_steps=23,
+        save_checkpoints_secs=24,
+        session_config=25,
+        keep_checkpoint_max=26,
+        keep_checkpoint_every_n_hours=27)
+    self.assertEqual(config.uid(), new_config.uid())
+    # model_dir is not on the default whitelist.
+    self.assertNotEqual(config.uid(whitelist=[]),
+                        new_config.uid(whitelist=[]))
+    new_config = new_config.replace(model_dir=ANOTHER_TEST_DIR)
+    self.assertNotEqual(config.uid(), new_config.uid())
+
   def test_uid_for_deepcopy(self):
     tf_config = {
         "cluster": {
diff --git a/tensorflow/contrib/learn/python/learn/learn_runner_test.py b/tensorflow/contrib/learn/python/learn/learn_runner_test.py
index 6c8cde453f..77bdcaeb7e 100644
--- a/tensorflow/contrib/learn/python/learn/learn_runner_test.py
+++ b/tensorflow/contrib/learn/python/learn/learn_runner_test.py
@@ -293,8 +293,7 @@ class LearnRunnerRunWithRunConfigTest(test.TestCase):
     def _experiment_fn(run_config, hparams):
       del run_config, hparams  # unused.
       # Explicitly use a new run_config.
-      new_config = run_config_lib.RunConfig(
-          model_dir=_MODIR_DIR, save_checkpoints_steps=123)
+      new_config = run_config_lib.RunConfig(model_dir=_MODIR_DIR + "/123")
 
       return TestExperiment(config=new_config)
 
diff --git a/tensorflow/contrib/losses/__init__.py b/tensorflow/contrib/losses/__init__.py
index 9861ecc1f8..790bf61367 100644
--- a/tensorflow/contrib/losses/__init__.py
+++ b/tensorflow/contrib/losses/__init__.py
@@ -22,10 +22,26 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
 
-# pylint: disable=unused-import,wildcard-import
-from tensorflow.contrib.losses.python import losses
+# pylint: disable=wildcard-import
 from tensorflow.contrib.losses.python.losses import *
-# pylint: enable=unused-import,wildcard-import
+# pylint: enable=wildcard-import
 
 from tensorflow.python.util.all_util import remove_undocumented
-remove_undocumented(__name__, doc_string_modules=[losses])
+
+_allowed_symbols = [
+    'absolute_difference',
+    'add_loss',
+    'hinge_loss',
+    'compute_weighted_loss',
+    'cosine_distance',
+    'get_losses',
+    'get_regularization_losses',
+    'get_total_loss',
+    'log_loss',
+    'mean_pairwise_squared_error',
+    'mean_squared_error',
+    'sigmoid_cross_entropy',
+    'softmax_cross_entropy',
+    'sparse_softmax_cross_entropy',
+]
+remove_undocumented(__name__, _allowed_symbols)
diff --git a/tensorflow/contrib/losses/python/losses/__init__.py b/tensorflow/contrib/losses/python/losses/__init__.py
index 1b57f0baee..6e9d1d4a77 100644
--- a/tensorflow/contrib/losses/python/losses/__init__.py
+++ b/tensorflow/contrib/losses/python/losses/__init__.py
@@ -12,127 +12,15 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ==============================================================================
-"""## Loss operations for use in neural networks.
+"""Ops for building neural network losses.
 
-Note: By default all the losses are collected into the `GraphKeys.LOSSES`
-collection.
-
-All of the loss functions take a pair of predictions and ground truth labels,
-from which the loss is computed. It is assumed that the shape of both these
-tensors is of the form [batch_size, d1, ... dN] where `batch_size` is the number
-of samples in the batch and `d1` ... `dN` are the remaining dimensions.
-
-It is common, when training with multiple loss functions, to adjust the relative
-strengths of individual losses. This is performed by rescaling the losses via
-a `weight` parameter passed to the loss functions. For example, if we were
-training with both log_loss and sum_of_squares_loss, and we wished that the
-log_loss penalty be twice as severe as the sum_of_squares_loss, we would
-implement this as:
-
-  # Explicitely set the weight.
-  tf.contrib.losses.log(predictions, labels, weight=2.0)
-
-  # Uses default weight of 1.0
-  tf.contrib.losses.sum_of_squares(predictions, labels)
-
-  # All the losses are collected into the `GraphKeys.LOSSES` collection.
-  losses = tf.get_collection(tf.GraphKeys.LOSSES)
-
-While specifying a scalar loss rescales the loss over the entire batch,
-we sometimes want to rescale the loss per batch sample. For example, if we have
-certain examples that matter more to us to get correctly, we might want to have
-a higher loss that other samples whose mistakes matter less. In this case, we
-can provide a weight vector of length `batch_size` which results in the loss
-for each sample in the batch being scaled by the corresponding weight element.
-For example, consider the case of a classification problem where we want to
-maximize our accuracy but we especially interested in obtaining high accuracy
-for a specific class:
-
-  inputs, labels = LoadData(batch_size=3)
-  logits = MyModelPredictions(inputs)
-
-  # Ensures that the loss for examples whose ground truth class is `3` is 5x
-  # higher than the loss for all other examples.
-  weight = tf.multiply(4, tf.cast(tf.equal(labels, 3), tf.float32)) + 1
-
-  onehot_labels = tf.one_hot(labels, num_classes=5)
-  tf.contrib.losses.softmax_cross_entropy(logits, onehot_labels, weight=weight)
-
-Finally, in certain cases, we may want to specify a different loss for every
-single measurable value. For example, if we are performing per-pixel depth
-prediction, or per-pixel denoising, a single batch sample has P values where P
-is the number of pixels in the image. For many losses, the number of measurable
-values matches the number of elements in the predictions and labels tensors.
-For others, such as softmax_cross_entropy and cosine_distance, the
-loss functions reduces the dimensions of the inputs to produces a tensor of
-losses for each measurable value. For example, softmax_cross_entropy takes as
-input predictions and labels of dimension [batch_size, num_classes] but the
-number of measurable values is [batch_size]. Consequently, when passing a weight
-tensor to specify a different loss for every measurable value, the dimension of
-the tensor will depend on the loss being used.
-
-For a concrete example, consider the case of per-pixel depth prediction where
-certain ground truth depth values are missing (due to sensor noise in the
-capture process). In this case, we want to assign zero weight to losses for
-these predictions.
-
-  # 'depths' that are missing have a value of 0:
-  images, depths = LoadData(...)
-  predictions = MyModelPredictions(images)
-
-  weight = tf.cast(tf.greater(depths, 0), tf.float32)
-  loss  = tf.contrib.losses.sum_of_squares(predictions, depths, weight)
-
-Note that when using weights for the losses, the final average is computed
-by rescaling the losses by the weights and then dividing by the total number of
-non-zero samples. For an arbitrary set of weights, this may not necessarily
-produce a weighted average. Instead, it simply and transparently rescales the
-per-element losses before averaging over the number of observations. For example
-if the losses computed by the loss function is an array [4, 1, 2, 3] and the
-weights are an array [1, 0.5, 3, 9], then the average loss is:
-
-  (4*1 + 1*0.5 + 2*3 + 3*9) / 4
-
-However, with a single loss function and an arbitrary set of weights, one can
-still easily create a loss function such that the resulting loss is a
-weighted average over the individual prediction errors:
-
-  images, labels = LoadData(...)
-  predictions = MyModelPredictions(images)
-
-  weight = MyComplicatedWeightingFunction(labels)
-  weight = tf.div(weight, tf.size(weight))
-  loss = tf.contrib.losses.sum_of_squares(predictions, depths, weight)
-
-@@absolute_difference
-@@add_loss
-@@hinge_loss
-@@compute_weighted_loss
-@@cosine_distance
-@@get_losses
-@@get_regularization_losses
-@@get_total_loss
-@@log_loss
-@@mean_pairwise_squared_error
-@@mean_squared_error
-@@sigmoid_cross_entropy
-@@softmax_cross_entropy
-@@sparse_softmax_cross_entropy
-
-The following are deprecated in favor of `mean_pairwise_squared_error` and
-`mean_squared_error`.
-@@sum_of_pairwise_squares
-@@sum_of_squares
+See @{$python/contrib.losses}.
 """
 
-
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
 
-# pylint: disable=unused-import,wildcard-import
+# pylint: disable=wildcard-import
 from tensorflow.contrib.losses.python.losses.loss_ops import *
-from tensorflow.python.util.all_util import make_all
-# pylint: enable=unused-import,wildcard-import
-
-__all__ = make_all(__name__)
+# pylint: enable=wildcard-import
diff --git a/tensorflow/contrib/makefile/sub_makefiles/hexagon_graph_execution/Makefile.in b/tensorflow/contrib/makefile/sub_makefiles/hexagon_graph_execution/Makefile.in
index ccbbfa4132..2a6f66edcb 100644
--- a/tensorflow/contrib/makefile/sub_makefiles/hexagon_graph_execution/Makefile.in
+++ b/tensorflow/contrib/makefile/sub_makefiles/hexagon_graph_execution/Makefile.in
@@ -47,7 +47,6 @@ GRAPH_TRANSFER_SRCS := \
 tensorflow/cc/framework/scope.cc \
 tensorflow/cc/framework/ops.cc \
 tensorflow/cc/ops/const_op.cc \
-tensorflow/core/kernels/function_ops.cc \
 tensorflow/core/kernels/hexagon/graph_transfer_utils.cc \
 tensorflow/core/kernels/hexagon/graph_transferer.cc \
 tensorflow/core/kernels/hexagon/hexagon_control_wrapper.cc \
diff --git a/tensorflow/contrib/seq2seq/__init__.py b/tensorflow/contrib/seq2seq/__init__.py
index dd497197e3..dc159b93a3 100644
--- a/tensorflow/contrib/seq2seq/__init__.py
+++ b/tensorflow/contrib/seq2seq/__init__.py
@@ -16,36 +16,6 @@
 """Ops for building neural network seq2seq decoders and losses.
 
 See the @{$python/contrib.seq2seq} guide.
-
-@@Decoder
-@@dynamic_decode
-
-@@BasicDecoderOutput
-@@BasicDecoder
-
-@@BeamSearchDecoderOutput
-@@BeamSearchDecoderState
-@@BeamSearchDecoder
-@@FinalBeamSearchDecoderOutput
-
-@@Helper
-@@CustomHelper
-@@GreedyEmbeddingHelper
-@@ScheduledEmbeddingTrainingHelper
-@@ScheduledOutputTrainingHelper
-@@TrainingHelper
-
-@@BahdanauAttention
-@@LuongAttention
-
-@@hardmax
-
-@@AttentionWrapperState
-@@AttentionWrapper
-
-@@gather_tree
-
-@@tile_batch
 """
 
 from __future__ import absolute_import
@@ -63,6 +33,30 @@ from tensorflow.contrib.seq2seq.python.ops.loss import *
 from tensorflow.python.util.all_util import remove_undocumented
 # pylint: enable=unused-import,widcard-import,line-too-long
 
-_allowed_symbols = ["sequence_loss"]
+_allowed_symbols = [
+    "sequence_loss",
+    "Decoder",
+    "dynamic_decode",
+    "BasicDecoder",
+    "BasicDecoderOutput",
+    "BeamSearchDecoder",
+    "BeamSearchDecoderOutput",
+    "BeamSearchDecoderState",
+    "Helper",
+    "CustomHelper",
+    "FinalBeamSearchDecoderOutput",
+    "gather_tree",
+    "GreedyEmbeddingHelper",
+    "ScheduledEmbeddingTrainingHelper",
+    "ScheduledOutputTrainingHelper",
+    "TrainingHelper",
+    "BahdanauAttention",
+    "LuongAttention",
+    "hardmax",
+    "AttentionWrapperState",
+    "AttentionWrapper",
+    "AttentionMechanism",
+    "tile_batch"]
+
 
 remove_undocumented(__name__, _allowed_symbols)
diff --git a/tensorflow/contrib/seq2seq/python/ops/attention_wrapper.py b/tensorflow/contrib/seq2seq/python/ops/attention_wrapper.py
index a0f1775257..8d1c0c59e0 100644
--- a/tensorflow/contrib/seq2seq/python/ops/attention_wrapper.py
+++ b/tensorflow/contrib/seq2seq/python/ops/attention_wrapper.py
@@ -39,6 +39,7 @@ from tensorflow.python.util import nest
 
 
 __all__ = [
+    "AttentionMechanism",
     "AttentionWrapper",
     "AttentionWrapperState",
     "LuongAttention",
diff --git a/tensorflow/contrib/seq2seq/python/ops/helper.py b/tensorflow/contrib/seq2seq/python/ops/helper.py
index d6c0527ad2..bdd7d7ca73 100644
--- a/tensorflow/contrib/seq2seq/python/ops/helper.py
+++ b/tensorflow/contrib/seq2seq/python/ops/helper.py
@@ -23,8 +23,6 @@ import abc
 
 import six
 
-from tensorflow.contrib.distributions.python.ops import bernoulli
-from tensorflow.contrib.distributions.python.ops import categorical
 from tensorflow.contrib.seq2seq.python.ops import decoder
 from tensorflow.python.framework import dtypes
 from tensorflow.python.framework import ops
@@ -35,6 +33,8 @@ from tensorflow.python.ops import embedding_ops
 from tensorflow.python.ops import math_ops
 from tensorflow.python.ops import random_ops
 from tensorflow.python.ops import tensor_array_ops
+from tensorflow.python.ops.distributions import bernoulli
+from tensorflow.python.ops.distributions import categorical
 from tensorflow.python.util import nest
 
 __all__ = [
diff --git a/tensorflow/core/common_runtime/shape_refiner.cc b/tensorflow/core/common_runtime/shape_refiner.cc
index 5135355a94..8eb383a14f 100644
--- a/tensorflow/core/common_runtime/shape_refiner.cc
+++ b/tensorflow/core/common_runtime/shape_refiner.cc
@@ -88,10 +88,7 @@ Status ShapeRefiner::AddNode(const Node* node) {
   }
 
   // This needs to be filled in with real data in a second pass.
-  std::vector<const Tensor*> input_tensors(node->num_inputs());
-  std::vector<Tensor> real_tensors(node->num_inputs());
-  std::vector<bool> attempted_materialization(node->num_inputs());
-  std::vector<bool> attempted_tensor_as_shape_conversion(node->num_inputs());
+  std::vector<const Tensor*> input_tensors(node->num_inputs(), nullptr);
   std::vector<ShapeHandle> input_tensors_as_shapes;
 
   // Create the inference context for this node with the existing input shapes.
@@ -104,78 +101,7 @@ Status ShapeRefiner::AddNode(const Node* node) {
   }
 
   // Run the shape inference function, and return if there was an error.
-  if (op_reg_data->shape_inference_fn) {
-    TF_RETURN_IF_ERROR(c->Run(op_reg_data->shape_inference_fn));
-  } else {
-    TF_RETURN_IF_ERROR(c->Run(shape_inference::UnknownShape));
-  }
-
-  // We must run the shape function repeatedly, in case users write
-  // shape functions where they only conditionally call input_tensor()
-  // based on the values of another input tensor.
-  bool rerun_shape_fn;
-  do {
-    // If the result of running shape inference would have benefitted
-    // from knowing the values of input tensors, try to materialize
-    // the results of those tensors, and then run the shape inference
-    // function again using those known tensors.
-    rerun_shape_fn = false;
-
-    // NOTE: It is possible to batch the extraction and
-    // materialization of inputs, instead of materializing one input
-    // at a time like we do below.  If input-at-a-time computation
-    // becomes a bottleneck, we could separate ExtractConstantSubgraph
-    // into two functions: one that returns true if an input is
-    // derivable from constants, and another function that extracts
-    // the subgraph for multiple target nodes and executes the whole
-    // subgraph once.
-
-    for (int i = 0; i < c->num_inputs(); ++i) {
-      if (!c->requested_input_tensor(i)) {
-        continue;
-      }
-      // Check if we have not already filled in the requested input,
-      // and if not, try to materialize the tensors.
-      if (!attempted_materialization[i]) {
-        attempted_materialization[i] = true;
-
-        Tensor result;
-        bool evaluated = false;
-        TF_RETURN_IF_ERROR(
-            EvaluateConstantTensorForEdge(node, i, &evaluated, &result));
-        if (evaluated) {
-          real_tensors[i] = result;
-          input_tensors[i] = &real_tensors[i];
-          // We have more concrete information about a shape,
-          // so re-run shape inference.
-          rerun_shape_fn = true;
-        }
-      }
-      if (c->requested_input_tensor_as_partial_shape(i) &&
-          !attempted_tensor_as_shape_conversion[i]) {
-        attempted_tensor_as_shape_conversion[i] = true;
-        if (i >= input_tensors_as_shapes.size()) {
-          input_tensors_as_shapes.resize(i + 1);
-        }
-        ShapeHandle s;
-        TF_RETURN_IF_ERROR(ConstantPartialShape(c.get(), node, i, &s));
-        input_tensors_as_shapes[i] = s;
-        rerun_shape_fn = true;
-      }
-    }
-
-    if (rerun_shape_fn) {
-      // We have more information about the shapes on this pass,
-      // so re-run shape inference.
-      c->set_input_tensors(input_tensors);
-      c->set_input_tensors_as_shapes(input_tensors_as_shapes);
-      if (op_reg_data->shape_inference_fn) {
-        TF_RETURN_IF_ERROR(op_reg_data->shape_inference_fn(c.get()));
-      } else {
-        TF_RETURN_IF_ERROR(shape_inference::UnknownShape(c.get()));
-      }
-    }
-  } while (rerun_shape_fn);
+  TF_RETURN_IF_ERROR(RunShapeFn(node, op_reg_data, c.get()));
 
   // Store the resulting InferenceContext object in the map.
   node_to_context_[node].swap(c);
@@ -211,6 +137,74 @@ Status ShapeRefiner::SetShape(const Node* node, int output_port,
   return Status::OK();
 }
 
+Status ShapeRefiner::UpdateNode(const Node* node, bool* refined) {
+  auto it = node_to_context_.find(node);
+  if (it == node_to_context_.end()) {
+    *refined = true;
+    return AddNode(node);
+  }
+  InferenceContext* node_context = it->second.get();
+
+  // Give up if the context wasn't successfully built by the AddNode() method.
+  TF_RETURN_IF_ERROR(node_context->construction_status());
+
+  // Check if the shapes of the nodes in the fan-in of this node have changed,
+  // and if they have update the node input shapes.
+  for (const Edge* e : node->in_edges()) {
+    if (e->IsControlEdge()) continue;
+
+    Node* input = e->src();
+    auto iter = node_to_context_.find(input);
+    if (iter == node_to_context_.end()) {
+      return errors::FailedPrecondition(
+          "Input ", e->dst_input(), " ('", input->name(), "') for '",
+          node->name(), "' was not previously added to ShapeRefiner.");
+    }
+
+    InferenceContext* c = iter->second.get();
+    DCHECK_GE(e->dst_input(), 0);
+    if (node_context->set_input(e->dst_input(), c->output(e->src_output()))) {
+      *refined = true;
+    }
+
+    // Also propagate handle shape and dtype of edges which are carrying
+    // resource handles.
+    if (e->src()->output_type(e->src_output()) == DT_RESOURCE) {
+      if (node_context->set_input_handle_dtype(
+              e->dst_input(), c->output_handle_dtype(e->src_output()))) {
+        *refined = true;
+      }
+      if (node_context->set_input_handle_shape(
+              e->dst_input(), c->output_handle_shape(e->src_output()))) {
+        *refined = true;
+      }
+    }
+  }
+
+  if (!*refined) {
+    // No input shape has changed, we're done
+    return Status::OK();
+  }
+
+  // Get and run the shape function for this node to update the shapes of the
+  // outputs.
+  const OpRegistrationData* op_reg_data;
+  TF_RETURN_IF_ERROR(ops_registry_->LookUp(node->type_string(), &op_reg_data));
+  if (op_reg_data->shape_inference_fn == nullptr &&
+      require_shape_inference_fns_) {
+    return errors::InvalidArgument(
+        "No shape inference function exists for op '", node->type_string(),
+        "', did you forget to define it?");
+  }
+
+  if (!op_reg_data->shape_inference_fn) {
+    // There is nothing more we can infer
+    return Status::OK();
+  }
+
+  return RunShapeFn(node, op_reg_data, node_context);
+}
+
 Status ShapeRefiner::EvaluateConstantTensorForEdge(const Node* node,
                                                    int dst_idx, bool* evaluated,
                                                    Tensor* result) {
@@ -463,4 +457,93 @@ Status ShapeRefiner::ConstantPartialShape(InferenceContext* target_context,
   return Status::OK();
 }
 
+Status ShapeRefiner::RunShapeFn(const Node* node,
+                                const OpRegistrationData* op_reg_data,
+                                shape_inference::InferenceContext* c) {
+  // This will be filled in with real data in a second pass.
+  std::vector<const Tensor*> input_tensors(node->num_inputs(), nullptr);
+  std::vector<Tensor> real_tensors(node->num_inputs());
+  std::vector<bool> attempted_materialization(node->num_inputs());
+  std::vector<bool> attempted_tensor_as_shape_conversion(node->num_inputs());
+  std::vector<ShapeHandle> input_tensors_as_shapes;
+
+  // Run the shape inference function, and return if there was an error.
+  c->set_input_tensors(input_tensors);
+  c->set_input_tensors_as_shapes(input_tensors_as_shapes);
+  if (op_reg_data->shape_inference_fn) {
+    TF_RETURN_IF_ERROR(c->Run(op_reg_data->shape_inference_fn));
+  } else {
+    TF_RETURN_IF_ERROR(c->Run(shape_inference::UnknownShape));
+  }
+
+  // We must run the shape function repeatedly, in case users write
+  // shape functions where they only conditionally call input_tensor()
+  // based on the values of another input tensor.
+  bool rerun_shape_fn;
+  do {
+    // If the result of running shape inference would have benefitted
+    // from knowing the values of input tensors, try to materialize
+    // the results of those tensors, and then run the shape inference
+    // function again using those known tensors.
+    rerun_shape_fn = false;
+
+    // NOTE: It is possible to batch the extraction and
+    // materialization of inputs, instead of materializing one input
+    // at a time like we do below.  If input-at-a-time computation
+    // becomes a bottleneck, we could separate ExtractConstantSubgraph
+    // into two functions: one that returns true if an input is
+    // derivable from constants, and another function that extracts
+    // the subgraph for multiple target nodes and executes the whole
+    // subgraph once.
+
+    for (int i = 0; i < c->num_inputs(); ++i) {
+      if (!c->requested_input_tensor(i)) {
+        continue;
+      }
+      // Check if we have not already filled in the requested input,
+      // and if not, try to materialize the tensors.
+      if (!attempted_materialization[i]) {
+        attempted_materialization[i] = true;
+
+        Tensor result;
+        bool evaluated = false;
+        TF_RETURN_IF_ERROR(
+            EvaluateConstantTensorForEdge(node, i, &evaluated, &result));
+        if (evaluated) {
+          real_tensors[i] = result;
+          input_tensors[i] = &real_tensors[i];
+          // We have more concrete information about a shape,
+          // so re-run shape inference.
+          rerun_shape_fn = true;
+        }
+      }
+      if (c->requested_input_tensor_as_partial_shape(i) &&
+          !attempted_tensor_as_shape_conversion[i]) {
+        attempted_tensor_as_shape_conversion[i] = true;
+        if (i >= input_tensors_as_shapes.size()) {
+          input_tensors_as_shapes.resize(i + 1);
+        }
+        ShapeHandle s;
+        TF_RETURN_IF_ERROR(ConstantPartialShape(c, node, i, &s));
+        input_tensors_as_shapes[i] = s;
+        rerun_shape_fn = true;
+      }
+    }
+
+    if (rerun_shape_fn) {
+      // We have more information about the shapes on this pass,
+      // so re-run shape inference.
+      c->set_input_tensors(input_tensors);
+      c->set_input_tensors_as_shapes(input_tensors_as_shapes);
+      if (op_reg_data->shape_inference_fn) {
+        TF_RETURN_IF_ERROR(op_reg_data->shape_inference_fn(c));
+      } else {
+        TF_RETURN_IF_ERROR(shape_inference::UnknownShape(c));
+      }
+    }
+  } while (rerun_shape_fn);
+
+  return Status::OK();
+}
+
 }  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/shape_refiner.h b/tensorflow/core/common_runtime/shape_refiner.h
index 2d04ea1505..9709bd0302 100644
--- a/tensorflow/core/common_runtime/shape_refiner.h
+++ b/tensorflow/core/common_runtime/shape_refiner.h
@@ -55,6 +55,11 @@ class ShapeRefiner {
   Status SetShape(const Node* node, int output_port,
                   shape_inference::ShapeHandle shape);
 
+  // Update the input shapes of node in case the shapes of the fan-ins of 'node'
+  // have themselves been modified (For example, in case of incremental shape
+  // refinement). Sets refined to true if any of the node shape has changed.
+  Status UpdateNode(const Node* node, bool* refined);
+
   // Returns the InferenceContext for 'node', if present.
   shape_inference::InferenceContext* GetContext(const Node* node) const {
     auto it = node_to_context_.find(node);
@@ -108,6 +113,9 @@ class ShapeRefiner {
                               const Node* node, int dst_idx,
                               shape_inference::ShapeHandle* result);
 
+  Status RunShapeFn(const Node* node, const OpRegistrationData* op_reg_data,
+                    shape_inference::InferenceContext* c);
+
   int32 graph_def_version_;
   const OpRegistryInterface* const ops_registry_;
 
diff --git a/tensorflow/core/common_runtime/shape_refiner_test.cc b/tensorflow/core/common_runtime/shape_refiner_test.cc
index d7e7c3b5ad..b8df6dd4f6 100644
--- a/tensorflow/core/common_runtime/shape_refiner_test.cc
+++ b/tensorflow/core/common_runtime/shape_refiner_test.cc
@@ -768,5 +768,38 @@ TEST(ShapeRefinerTest, ConstantValueAsShape_ConcatInvalidDimValue) {
             m.AddNode(result).error_message());
 }
 
+TEST(ShapeRefinerTest, IncrementalUpdates) {
+  Scope root = Scope::NewRootScope();
+  Graph* g = root.graph();
+  Node* queue;
+  TF_CHECK_OK(NodeBuilder("queue", "FIFOQueueV2")
+                  .Attr("component_types", {DT_FLOAT})
+                  .Finalize(g, &queue));
+  Node* dequeue;
+  TF_CHECK_OK(NodeBuilder("dequeue", "QueueDequeueV2")
+                  .Attr("component_types", {DT_FLOAT})
+                  .Input(queue)
+                  .Finalize(g, &dequeue));
+  ShapeRefiner m(TF_GRAPH_DEF_VERSION, OpRegistry::Global());
+  TF_ASSERT_OK(m.AddNode(queue));
+  TF_ASSERT_OK(m.AddNode(dequeue));
+
+  // At this point, the shapes of the dequeued tensor are unknown.
+  shape_inference::InferenceContext* ctx = m.GetContext(dequeue);
+  EXPECT_EQ("?", ctx->DebugString(ctx->output(0)));
+
+  // Inject a shape, and incrementally propagate it to the dequeue op.
+  ctx = m.GetContext(queue);
+  shape_inference::ShapeHandle shp = ctx->MakeShape({3, 7});
+  ctx->set_output_handle_shape(0, shp);
+  ctx->set_output_handle_dtype(0, DT_FLOAT);
+
+  bool refined = false;
+  TF_ASSERT_OK(m.UpdateNode(dequeue, &refined));
+  EXPECT_TRUE(refined);
+  ctx = m.GetContext(dequeue);
+  EXPECT_EQ("[3,7]", ctx->DebugString(ctx->output(0)));
+}
+
 }  // namespace
 }  // namespace tensorflow
diff --git a/tensorflow/core/framework/function.cc b/tensorflow/core/framework/function.cc
index c731155924..a387d49613 100644
--- a/tensorflow/core/framework/function.cc
+++ b/tensorflow/core/framework/function.cc
@@ -582,7 +582,7 @@ string Print(const GraphDef& gdef) {
   for (size_t i = 0; i < arg.size(); ++i) {
     const NodeDef* n = arg[i];
     if (i > 0) strings::StrAppend(&out, ", ");
-    CHECK_EQ(2, n->attr_size());
+    CHECK_GE(n->attr_size(), 2);
     strings::StrAppend(&out, n->name(), ":", get_type(*n));
   }
   strings::StrAppend(&out, ") -> (");
diff --git a/tensorflow/core/framework/shape_inference.h b/tensorflow/core/framework/shape_inference.h
index e88f6dbb04..71663027b3 100644
--- a/tensorflow/core/framework/shape_inference.h
+++ b/tensorflow/core/framework/shape_inference.h
@@ -191,6 +191,17 @@ class InferenceContext {
     return s;
   }
 
+  // Set the shape of the input in position idx. This requires idx to be in the
+  // [0, num_inputs) range. Returns true iff the stored input shape has been
+  // updated with a different handle.
+  bool set_input(int idx, ShapeHandle shape) {
+    if (!inputs_[idx].SameHandle(shape)) {
+      inputs_[idx] = shape;
+      return true;
+    } else {
+      return false;
+    }
+  }
   ShapeHandle input(int64 idx) const { return inputs_[idx]; }
   Status input(StringPiece input_name, std::vector<ShapeHandle>* output) const;
   int num_inputs() const { return inputs_.size(); }
@@ -430,15 +441,53 @@ class InferenceContext {
   // and dtypes of tensors which can be accessed via the handle. These methods
   // propagate that information. Output handle dtypes and shapes are ignored if
   // the output tensor is not of type DT_RESOURCE.
+
+  // Set the shape corresponding to the resource in position idx. This requires
+  // idx to be in the [0, num_inputs) range. Returns true iff the stored shape
+  // has been updated with a different handle.
+  bool set_input_handle_shape(int idx, ShapeHandle shape) {
+    if (!input_handle_shape_[idx].SameHandle(shape)) {
+      input_handle_shape_[idx] = shape;
+      return true;
+    }
+    return false;
+  }
+
+  // Set the type corresponding to the resource in position idx. This requires
+  // idx to be in the [0, num_inputs) range. Returns true iff the stored type
+  // has been updated.
+  bool set_input_handle_dtype(int idx, DataType dtype) {
+    if (input_handle_dtype_[idx] != dtype) {
+      input_handle_dtype_[idx] = dtype;
+      return true;
+    }
+    return false;
+  }
   ShapeHandle input_handle_shape(int idx);
   DataType input_handle_dtype(int idx) const {
     return input_handle_dtype_[idx];
   }
-  void set_output_handle_shape(int idx, ShapeHandle shape) {
-    output_handle_shape_[idx] = shape;
+
+  // Set the shape corresponding to the resource in position idx. This requires
+  // idx to be in the [0, num_outputs) range.
+  // Returns true iff the stored shape has been updated with a different handle.
+  bool set_output_handle_shape(int idx, ShapeHandle shape) {
+    if (!output_handle_shape_[idx].SameHandle(shape)) {
+      output_handle_shape_[idx] = shape;
+      return true;
+    }
+    return false;
   }
-  void set_output_handle_dtype(int idx, DataType dtype) {
-    output_handle_dtype_[idx] = dtype;
+
+  // Set the type corresponding to the resource in position idx. This requires
+  // idx to be in the [0, num_outputs) range. Returns true iff the stored type
+  // has been updated.
+  bool set_output_handle_dtype(int idx, DataType dtype) {
+    if (output_handle_dtype_[idx] != dtype) {
+      output_handle_dtype_[idx] = dtype;
+      return true;
+    }
+    return false;
   }
   ShapeHandle output_handle_shape(int idx) const {
     return output_handle_shape_[idx];
diff --git a/tensorflow/core/framework/shape_inference_test.cc b/tensorflow/core/framework/shape_inference_test.cc
index c82b506e4b..78d1fc0fc5 100644
--- a/tensorflow/core/framework/shape_inference_test.cc
+++ b/tensorflow/core/framework/shape_inference_test.cc
@@ -558,6 +558,11 @@ TEST_F(ShapeInferenceTest, MergeShape) {
   EXPECT_TRUE(SameHandle(c.Dim(s_1_u, 0), c.Dim(out, 0)));
   EXPECT_TRUE(SameHandle(c.Dim(s_u_2, 1), c.Dim(out, 1)));
 
+  auto s_u1 = c.UnknownShapeOfRank(1);
+  auto s_u2 = c.UnknownShapeOfRank(1);
+  TF_EXPECT_OK(c.Merge(s_u1, s_u2, &out));
+  EXPECT_TRUE(SameHandle(s_u1, out));
+
   // Incompatible merges give errors and set out to nullptr.
   out = s_unknown;
   EXPECT_TRUE(
diff --git a/tensorflow/core/grappler/clusters/BUILD b/tensorflow/core/grappler/clusters/BUILD
index 34ad404856..b48025b86f 100644
--- a/tensorflow/core/grappler/clusters/BUILD
+++ b/tensorflow/core/grappler/clusters/BUILD
@@ -58,6 +58,7 @@ cc_library(
         "//tensorflow/core:core_cpu",
         "//tensorflow/core:direct_session",
         "//tensorflow/core:lib",
+        "//tensorflow/core:protos_all_cc",
         "//tensorflow/core/grappler:utils",
         "//tensorflow/core/kernels:ops_util",
     ],
diff --git a/tensorflow/core/grappler/clusters/single_machine.cc b/tensorflow/core/grappler/clusters/single_machine.cc
index 09c8d55efd..abb9e4245e 100644
--- a/tensorflow/core/grappler/clusters/single_machine.cc
+++ b/tensorflow/core/grappler/clusters/single_machine.cc
@@ -18,6 +18,7 @@ limitations under the License.
 #include <memory>
 
 #include "tensorflow/cc/training/queue_runner.h"
+#include "tensorflow/core/framework/step_stats.pb.h"
 #include "tensorflow/core/grappler/utils.h"
 #include "tensorflow/core/kernels/ops_util.h"
 #include "tensorflow/core/lib/core/errors.h"
@@ -111,6 +112,8 @@ Status SingleMachine::Run(const GraphDef& graph_def,
         for (auto node : *init_metadata_.mutable_cost_graph()->mutable_node()) {
           node.clear_compute_cost();
         }
+        // Also clear the timeline to save memory
+        init_metadata_.clear_step_stats();
       }
       for (int i = 0; i < queue_runner_defs_.size(); ++i) {
         std::unique_ptr<QueueRunner> queue_runner;
@@ -133,15 +136,17 @@ Status SingleMachine::Run(const GraphDef& graph_def,
     }
   }
 
-  TF_RETURN_IF_ERROR(RunWithTimeout(feed, fetch, metadata));
-
   if (metadata) {
-    // Add the costs of initialization and the queue runners.
-    metadata->MergeFrom(init_metadata_);
-    return coordinator_->ExportCostGraph(metadata->mutable_cost_graph());
+    TF_RETURN_IF_ERROR(RunWithTimeout(feed, fetch, metadata));
+    // Merge the costs of the initialization and the queue runners.
+    CostGraphDef queue_costs;
+    TF_RETURN_IF_ERROR(coordinator_->ExportCostGraph(&queue_costs));
+    MergeCosts(metadata->mutable_cost_graph(), init_metadata_.cost_graph(),
+               queue_costs);
   } else {
-    return Status::OK();
+    return RunWithTimeout(feed, fetch, nullptr);
   }
+  return Status::OK();
 }
 
 Status SingleMachine::RunWithTimeout(
@@ -249,5 +254,36 @@ Status SingleMachine::ResetSession() {
   return Status::OK();
 }
 
+void SingleMachine::MergeCosts(CostGraphDef* graph_costs,
+                               const CostGraphDef& init_costs,
+                               const CostGraphDef& queue_costs) {
+  graph_costs->mutable_node()->Reserve(graph_costs->node_size() +
+                                       init_costs.node_size() +
+                                       queue_costs.node_size());
+  std::unordered_set<string> nodes_seen;
+  for (const auto& node : graph_costs->node()) {
+    nodes_seen.insert(node.name());
+  }
+
+  // The costs obtained by running the main graph could be more stable than
+  // the one we get from the queue runners since the queue runners run
+  // asynchronously.
+  for (const auto& node : queue_costs.node()) {
+    if (nodes_seen.find(node.name()) != nodes_seen.end()) {
+      continue;
+    }
+    graph_costs->add_node()->MergeFrom(node);
+  }
+
+  // Don't overwrite the costs with that generated during initialization since
+  // these are possibly outdated.
+  for (const auto& node : init_costs.node()) {
+    if (nodes_seen.find(node.name()) != nodes_seen.end()) {
+      continue;
+    }
+    graph_costs->add_node()->MergeFrom(node);
+  }
+}
+
 }  // namespace grappler
 }  // namespace tensorflow
diff --git a/tensorflow/core/grappler/clusters/single_machine.h b/tensorflow/core/grappler/clusters/single_machine.h
index f69b11df5d..f2773376e4 100644
--- a/tensorflow/core/grappler/clusters/single_machine.h
+++ b/tensorflow/core/grappler/clusters/single_machine.h
@@ -47,6 +47,8 @@ class SingleMachine : public Cluster {
                         RunMetadata* run_metadata, int64 timeout_s);
   Status ResetSession();
   Status CloseSession(bool use_timeout);
+  void MergeCosts(CostGraphDef* graph_costs, const CostGraphDef& init_costs,
+                  const CostGraphDef& queue_costs);
 
   const int num_gpus_;
   std::unique_ptr<Session> session_;
diff --git a/tensorflow/core/grappler/clusters/single_machine_test.cc b/tensorflow/core/grappler/clusters/single_machine_test.cc
index 0572aa04be..17db48817e 100644
--- a/tensorflow/core/grappler/clusters/single_machine_test.cc
+++ b/tensorflow/core/grappler/clusters/single_machine_test.cc
@@ -159,6 +159,121 @@ TEST_F(SingleMachineTest, InitializationMemory) {
   EXPECT_TRUE(found);
 }
 
+namespace {
+template <class T>
+inline void SetNodeAttr(const string& key, const T& value, NodeDef* node) {
+  AttrValue attr_value;
+  SetAttrValue(value, &attr_value);
+  auto* attr_map = node->mutable_attr();
+  (*attr_map)[key] = attr_value;
+}
+template <>
+inline void SetNodeAttr(const string& key, const Tensor& tensor,
+                        NodeDef* node) {
+  TensorProto tensor_proto;
+  tensor.AsProtoTensorContent(&tensor_proto);
+  SetNodeAttr(key, tensor_proto, node);
+}
+
+}  // namespace
+
+TEST_F(SingleMachineTest, PersistentMemory) {
+  // Build a hashtable and its initialization graph.
+  GrapplerItem item;
+  const DataType key_dtype = DT_INT64;
+  const DataType data_dtype = DT_INT64;
+
+  NodeDef* hashtable_node = item.graph.add_node();
+  hashtable_node->set_op("HashTable");
+  hashtable_node->set_name("hash_table");
+  SetNodeAttr("key_dtype", key_dtype, hashtable_node);
+  SetNodeAttr("value_dtype", data_dtype, hashtable_node);
+
+  // Initial hashtable keys and values
+  NodeDef* keys_node = item.graph.add_node();
+  keys_node->set_op("Const");
+  keys_node->set_name("table_keys");
+  SetNodeAttr("dtype", key_dtype, keys_node);
+  Tensor keys(key_dtype, TensorShape{2});
+  keys.vec<int64>()(0) = 123;
+  keys.vec<int64>()(1) = 321;
+  SetNodeAttr("value", keys, keys_node);
+
+  NodeDef* values_node = item.graph.add_node();
+  values_node->set_op("Const");
+  values_node->set_name("table_values");
+  SetNodeAttr("dtype", data_dtype, values_node);
+  Tensor values(data_dtype, TensorShape{2});
+  values.vec<int64>()(0) = 789;
+  values.vec<int64>()(1) = 987;
+  SetNodeAttr("value", values, values_node);
+
+  // InitializeTable node
+  NodeDef* init_table_node = item.graph.add_node();
+  init_table_node->set_op("InitializeTable");
+  init_table_node->set_name("initialize_table");
+  SetNodeAttr("Tkey", key_dtype, init_table_node);
+  SetNodeAttr("Tval", data_dtype, init_table_node);
+  *init_table_node->add_input() = "hash_table";
+  *init_table_node->add_input() = "table_keys";
+  *init_table_node->add_input() = "table_values";
+  item.init_ops.push_back(init_table_node->name());
+
+  // Key to lookup
+  NodeDef* query_node = item.graph.add_node();
+  query_node->set_op("Const");
+  query_node->set_name("query");
+  SetNodeAttr("dtype", key_dtype, query_node);
+  Tensor query(key_dtype, TensorShape({}));
+  query.flat<int64>()(0) = 0;
+  SetNodeAttr("value", query, query_node);
+
+  // Default return value of hashtable lookup
+  NodeDef* default_value_node = item.graph.add_node();
+  default_value_node->set_op("Const");
+  default_value_node->set_name("default_table_value");
+  SetNodeAttr("dtype", data_dtype, default_value_node);
+  Tensor dflt(data_dtype, TensorShape({}));
+  dflt.flat<int64>()(0) = 456;
+  SetNodeAttr("value", dflt, default_value_node);
+
+  // HashTable lookup node
+  NodeDef* lookup_node = item.graph.add_node();
+  lookup_node->set_op("LookupTableFind");
+  lookup_node->set_name("table_lookup");
+  SetNodeAttr("Tin", key_dtype, lookup_node);
+  SetNodeAttr("Tout", data_dtype, lookup_node);
+  *lookup_node->add_input() = "hash_table";
+  *lookup_node->add_input() = "query";
+  *lookup_node->add_input() = "default_table_value";
+  item.fetch.push_back(lookup_node->name());
+
+  // Run the graph
+  TF_CHECK_OK(cluster_->Initialize(item));
+  RunMetadata metadata;
+  TF_CHECK_OK(cluster_->Run(item.graph, item.feed, item.fetch, &metadata));
+
+  // Check the cost model.
+  bool found_table_init = false;
+  bool found_hashtable = false;
+  for (const auto& node : metadata.cost_graph().node()) {
+    if (node.name() == "hash_table") {
+      found_hashtable = true;
+      // Persistent memory usage should be 0 since it's recorded as part of the
+      // initialize_table op.
+      EXPECT_EQ(0, node.host_persistent_memory_size());
+      EXPECT_EQ(0, node.device_persistent_memory_size());
+    } else if (node.name() == "initialize_table") {
+      found_table_init = true;
+      // Persistent memory should hold 2 keys and 2 values.
+      EXPECT_LE(4 * sizeof(int64), node.host_persistent_memory_size());
+      EXPECT_EQ(0, node.device_persistent_memory_size());
+    }
+  }
+  EXPECT_TRUE(found_table_init);
+  EXPECT_TRUE(found_hashtable);
+}
+
 }  // namespace
 }  // namespace grappler
 }  // namespace tensorflow
diff --git a/tensorflow/core/grappler/costs/BUILD b/tensorflow/core/grappler/costs/BUILD
index d078d9af09..e784c2df44 100644
--- a/tensorflow/core/grappler/costs/BUILD
+++ b/tensorflow/core/grappler/costs/BUILD
@@ -50,11 +50,13 @@ cc_test(
     args = ["--heap_check=local"],  # The GPU tracer leaks memory
     deps = [
         ":graph_properties",
+        "//tensorflow/cc:cc_ops",
+        "//tensorflow/cc:scope",
+        "//tensorflow/core:framework",
         "//tensorflow/core:lib_proto_parsing",
         "//tensorflow/core:test",
         "//tensorflow/core:test_main",
         "//tensorflow/core/grappler:grappler_item",
-        "//tensorflow/core/grappler:grappler_item_builder",
         "//tensorflow/core/grappler/clusters:single_machine",
         "//tensorflow/core/grappler/inputs:trivial_test_graph_input_yielder",
     ],
diff --git a/tensorflow/core/grappler/costs/graph_properties.cc b/tensorflow/core/grappler/costs/graph_properties.cc
index 06e91af2c2..31c1043ae6 100644
--- a/tensorflow/core/grappler/costs/graph_properties.cc
+++ b/tensorflow/core/grappler/costs/graph_properties.cc
@@ -15,6 +15,9 @@ limitations under the License.
 
 #include "tensorflow/core/grappler/costs/graph_properties.h"
 
+#include <queue>
+#include <unordered_map>
+#include <unordered_set>
 #include "tensorflow/core/common_runtime/shape_refiner.h"
 #include "tensorflow/core/framework/tensor_shape.pb.h"
 #include "tensorflow/core/graph/graph_constructor.h"
@@ -31,6 +34,76 @@ Status GraphProperties::InferStatically() {
   Status s = ImportGraphDef(options, item_.graph, &graph, &shape_refiner);
   TF_RETURN_IF_ERROR(s);
 
+  // List the resources and the nodes using them
+  std::unordered_map<const Node*, std::unordered_set<const Node*>> resources;
+  for (const Node* const node : graph.nodes()) {
+    for (int i = 0; i < node->num_inputs(); ++i) {
+      if (node->input_type(i) == DataType::DT_RESOURCE) {
+        const Node* resource;
+        TF_CHECK_OK(node->input_node(i, &resource));
+        resources[resource].insert(node);
+      }
+    }
+  }
+
+  // If we found a resource, try to propagate the shapes through it.
+  bool done = true;
+  do {
+    std::queue<const Node*> new_shapes;
+    for (const auto& resource_data : resources) {
+      const Node* qnode = resource_data.first;
+      StringPiece type(qnode->type_string());
+      if (!type.ends_with("QueueV2")) {
+        continue;
+      }
+      auto qctx = shape_refiner.GetContext(qnode);
+      if (!qctx) {
+        continue;
+      }
+      DataType queue_type = qctx->output_handle_dtype(0);
+      shape_inference::ShapeHandle queue_shp = qctx->output_handle_shape(0);
+      if (qctx->FullyDefined(queue_shp) && queue_type != DT_INVALID) {
+        continue;
+      }
+
+      for (const auto& node : resource_data.second) {
+        auto ctx = shape_refiner.GetContext(node);
+        if (!ctx) {
+          continue;
+        }
+        if (node->type_string().find("Enqueue") != std::string::npos) {
+          if (ctx->num_inputs() == 2) {
+            const DataType dtype = node->input_type(1);
+            if (queue_type == DT_INVALID) {
+              queue_type = dtype;
+            } else {
+              CHECK_EQ(queue_type, dtype);
+            }
+            shape_inference::ShapeHandle shp = ctx->input(1);
+            TF_RETURN_IF_ERROR(qctx->Merge(queue_shp, shp, &queue_shp));
+          }
+        }
+      }
+      if (qctx->set_output_handle_dtype(0, queue_type) ||
+          qctx->set_output_handle_shape(0, queue_shp)) {
+        new_shapes.push(qnode);
+      }
+    }
+    // Propagate the shapes in the transitive fan-out of the queue.
+    done = new_shapes.empty();
+    while (!new_shapes.empty()) {
+      const Node* n = new_shapes.front();
+      new_shapes.pop();
+      for (const Node* fanout : n->out_nodes()) {
+        bool updated = false;
+        TF_RETURN_IF_ERROR(shape_refiner.UpdateNode(fanout, &updated));
+        if (updated) {
+          new_shapes.push(fanout);
+        }
+      }
+    }
+  } while (!done);
+
   for (const Node* const node : graph.nodes()) {
     VLOG(1) << "<Node> " << node->name();
     auto ctx = shape_refiner.GetContext(node);
diff --git a/tensorflow/core/grappler/costs/graph_properties_test.cc b/tensorflow/core/grappler/costs/graph_properties_test.cc
index 32683644fb..94b809dc44 100644
--- a/tensorflow/core/grappler/costs/graph_properties_test.cc
+++ b/tensorflow/core/grappler/costs/graph_properties_test.cc
@@ -14,6 +14,9 @@ limitations under the License.
 ==============================================================================*/
 
 #include "tensorflow/core/grappler/costs/graph_properties.h"
+#include "tensorflow/cc/framework/scope.h"
+#include "tensorflow/cc/ops/standard_ops.h"
+#include "tensorflow/core/framework/node_def_builder.h"
 #include "tensorflow/core/grappler/clusters/single_machine.h"
 #include "tensorflow/core/grappler/grappler_item.h"
 #include "tensorflow/core/grappler/inputs/trivial_test_graph_input_yielder.h"
@@ -129,6 +132,101 @@ TEST_F(GraphPropertiesTest, DynamicProperties) {
   }
 }
 
+TEST_F(GraphPropertiesTest, VarHandles) {
+  GrapplerItem item;
+  TF_CHECK_OK(NodeDefBuilder("Var", "VarHandleOp")
+                  .Attr("dtype", DT_FLOAT)
+                  .Attr("shape", TensorShape({3, 7}))
+                  .Finalize(item.graph.add_node()));
+
+  TF_CHECK_OK(NodeDefBuilder("VarRead", "ReadVariableOp")
+                  .Attr("dtype", DT_FLOAT)
+                  .Input("Var", 0, DT_RESOURCE)
+                  .Finalize(item.graph.add_node()));
+
+  GraphProperties properties(item);
+  TF_CHECK_OK(properties.InferStatically());
+
+  const auto props = properties.GetOutputProperties("VarRead");
+  EXPECT_EQ(1, props.size());
+  const OpInfo::TensorProperties& prop = props[0];
+  EXPECT_EQ(DT_FLOAT, prop.dtype());
+  EXPECT_FALSE(prop.shape().unknown_rank());
+  EXPECT_EQ(2, prop.shape().dim_size());
+  EXPECT_EQ(3, prop.shape().dim(0).size());
+  EXPECT_EQ(7, prop.shape().dim(1).size());
+}
+
+TEST_F(GraphPropertiesTest, Queues) {
+  // Create a graph with known input shapes, and propagate the shapes through a
+  // couple of queues.
+  tensorflow::Scope root = tensorflow::Scope::NewRootScope();
+
+  auto q1 = ops::FIFOQueue(root.WithOpName("Queue1"), {DataType::DT_FLOAT});
+  Output rnd =
+      ops::RandomNormal(root.WithOpName("rnd"), {3, 7}, DataType::DT_FLOAT);
+  Output square1 = ops::Square(root.WithOpName("Square1"), rnd);
+  auto enqueue1 = ops::QueueEnqueue(root.WithOpName("Enqueue1"), q1, {square1});
+  auto dequeue1 =
+      ops::QueueDequeue(root.WithOpName("Dequeue1"), q1, {DataType::DT_FLOAT});
+
+  auto q2 =
+      ops::RandomShuffleQueue(root.WithOpName("Queue2"), {DataType::DT_FLOAT});
+  Output square2 = ops::Square(root.WithOpName("Square2"), dequeue1[0]);
+  auto enqueue2 = ops::QueueEnqueue(root.WithOpName("Enqueue2"), q2, {square2});
+  auto dequeue2 =
+      ops::QueueDequeue(root.WithOpName("Dequeue2"), q2, {DataType::DT_FLOAT});
+
+  auto q3 =
+      ops::RandomShuffleQueue(root.WithOpName("Queue3"), {DataType::DT_FLOAT});
+  auto dequeue3 =
+      ops::QueueDequeue(root.WithOpName("Dequeue3"), q3, {DataType::DT_FLOAT});
+
+  auto q4 =
+      ops::RandomShuffleQueue(root.WithOpName("Queue4"), {DataType::DT_FLOAT});
+  auto enqueue4 = ops::QueueEnqueue(root.WithOpName("Enqueue4"), q4, {square2});
+  auto enqueue4_2 =
+      ops::QueueEnqueue(root.WithOpName("Enqueue4_2"), q4, {dequeue3[0]});
+  auto dequeue4 =
+      ops::QueueDequeue(root.WithOpName("Dequeue4"), q4, {DataType::DT_FLOAT});
+
+  GrapplerItem item;
+  TF_CHECK_OK(root.ToGraphDef(&item.graph));
+
+  GraphProperties properties(item);
+  TF_CHECK_OK(properties.InferStatically());
+
+  const auto props1 = properties.GetOutputProperties("Dequeue1");
+  EXPECT_EQ(1, props1.size());
+  const OpInfo::TensorProperties& prop1 = props1[0];
+  EXPECT_EQ(DT_FLOAT, prop1.dtype());
+  EXPECT_FALSE(prop1.shape().unknown_rank());
+  EXPECT_EQ(2, prop1.shape().dim_size());
+  EXPECT_EQ(3, prop1.shape().dim(0).size());
+  EXPECT_EQ(7, prop1.shape().dim(1).size());
+
+  const auto props2 = properties.GetOutputProperties("Dequeue2");
+  EXPECT_EQ(1, props2.size());
+  const OpInfo::TensorProperties& prop2 = props2[0];
+  EXPECT_EQ(DT_FLOAT, prop2.dtype());
+  EXPECT_FALSE(prop2.shape().unknown_rank());
+  EXPECT_EQ(2, prop2.shape().dim_size());
+  EXPECT_EQ(3, prop2.shape().dim(0).size());
+  EXPECT_EQ(7, prop2.shape().dim(1).size());
+
+  // The dequeue3 op shape is unknown. The square2 op shape is known. Verify
+  // that we merge the 2 properly to determine the shape of the data coming out
+  // of the queue.
+  const auto props4 = properties.GetOutputProperties("Dequeue4");
+  EXPECT_EQ(1, props4.size());
+  const OpInfo::TensorProperties& prop4 = props4[0];
+  EXPECT_EQ(DT_FLOAT, prop4.dtype());
+  EXPECT_FALSE(prop4.shape().unknown_rank());
+  EXPECT_EQ(2, prop4.shape().dim_size());
+  EXPECT_EQ(3, prop4.shape().dim(0).size());
+  EXPECT_EQ(7, prop4.shape().dim(1).size());
+}
+
 }  // namespace
 }  // namespace grappler
 }  // namespace tensorflow
diff --git a/tensorflow/core/kernels/BUILD b/tensorflow/core/kernels/BUILD
index 231e06d5f4..29b4d63bbf 100644
--- a/tensorflow/core/kernels/BUILD
+++ b/tensorflow/core/kernels/BUILD
@@ -2042,6 +2042,7 @@ tf_kernel_library(
     deps = [
         "//tensorflow/core:framework",
         "//tensorflow/core:lib",
+        "//tensorflow/core/platform/default/build_config:cublas_plugin",
         "@local_config_cuda//cuda:cusolver",
     ],
 )
@@ -2322,7 +2323,9 @@ tf_kernel_library(
     prefix = "fft_ops",
     deps = MATH_DEPS + [
         "//tensorflow/core:spectral_ops_op_lib",
-    ],
+    ] + if_cuda([
+        "//tensorflow/core/platform/default/build_config:cufft_plugin",
+    ]),
 )
 
 tf_kernel_library(
@@ -2626,7 +2629,9 @@ tf_kernel_library(
             "@libxsmm_archive//:xsmm_avx",
         ],
         "//conditions:default": [],
-    }),
+    }) + if_cuda([
+        "//tensorflow/core/platform/default/build_config:cudnn_plugin",
+    ]),
 )
 
 tf_kernel_library(
diff --git a/tensorflow/core/kernels/crop_and_resize_op.cc b/tensorflow/core/kernels/crop_and_resize_op.cc
index 746fe63e2a..1c7afcf866 100644
--- a/tensorflow/core/kernels/crop_and_resize_op.cc
+++ b/tensorflow/core/kernels/crop_and_resize_op.cc
@@ -19,6 +19,9 @@ limitations under the License.
 
 #include "tensorflow/core/kernels/crop_and_resize_op.h"
 
+#include <functional>
+#include <string>
+
 #include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
 #include "tensorflow/core/framework/op_kernel.h"
 #include "tensorflow/core/framework/register_types.h"
@@ -26,10 +29,13 @@ limitations under the License.
 #include "tensorflow/core/framework/tensor_shape.h"
 #include "tensorflow/core/framework/types.h"
 #include "tensorflow/core/kernels/bounds_check.h"
+#include "tensorflow/core/lib/core/errors.h"
 #include "tensorflow/core/lib/core/status.h"
 #include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/types.h"
 
 #if GOOGLE_CUDA
+#include "tensorflow/core/common_runtime/gpu/gpu_event_mgr.h"
 #include "tensorflow/core/platform/stream_executor.h"
 #endif  // GOOGLE_CUDA
 
@@ -37,41 +43,67 @@ namespace tensorflow {
 
 typedef Eigen::ThreadPoolDevice CPUDevice;
 typedef Eigen::GpuDevice GPUDevice;
+using Callback = std::function<void()>;
+
+namespace {
 
-static inline void ParseAndCheckBoxSizes(OpKernelContext* context,
-                                         const Tensor& boxes,
-                                         const Tensor& box_ind,
-                                         int* num_boxes) {
-  if (boxes.NumElements() == 0 && box_ind.NumElements() == 0) {
+static inline Status ParseAndCheckBoxSizes(const Tensor& boxes,
+                                           const Tensor& box_index,
+                                           int* num_boxes) {
+  if (boxes.NumElements() == 0 && box_index.NumElements() == 0) {
     *num_boxes = 0;
-    return;
+    return Status::OK();
   }
   // The shape of 'boxes' is [num_boxes, 4].
-  OP_REQUIRES(context, boxes.dims() == 2,
-              errors::InvalidArgument("boxes must be 2-D",
-                                      boxes.shape().DebugString()));
+  if (boxes.dims() != 2) {
+    return errors::InvalidArgument("boxes must be 2-D",
+                                   boxes.shape().DebugString());
+  }
   *num_boxes = boxes.dim_size(0);
-  OP_REQUIRES(context, boxes.dim_size(1) == 4,
-              errors::InvalidArgument("boxes must have 4 columns"));
-
-  // The shape of 'box_ind' is [num_boxes].
-  OP_REQUIRES(context, box_ind.dims() == 1,
-              errors::InvalidArgument("box_ind must be 1-D",
-                                      box_ind.shape().DebugString()));
-  OP_REQUIRES(context, box_ind.dim_size(0) == *num_boxes,
-              errors::InvalidArgument("box_ind has incompatible shape"));
+  if (boxes.dim_size(1) != 4) {
+    return errors::InvalidArgument("boxes must have 4 columns");
+  }
+  // The shape of 'box_index' is [num_boxes].
+  if (box_index.dims() != 1) {
+    return errors::InvalidArgument("box_index must be 1-D",
+                                   box_index.shape().DebugString());
+  }
+  if (box_index.dim_size(0) != *num_boxes) {
+    return errors::InvalidArgument("box_index has incompatible shape");
+  }
+  return Status::OK();
 }
 
-// Verifies that all values in box_ind are in [0, batch).
+// Conditionally calls the compute callback if all values in box_index are in
+// [0, batch_size) then calls done.
 template <typename Device>
-inline void CheckValidBoxInd(
-    OpKernelContext* context,
-    typename TTypes<int32, 1>::ConstTensor box_ind_data, int batch);
+inline void RunIfBoxIndexIsValid(
+    OpKernelContext* context, typename TTypes<int32, 1>::ConstTensor box_index,
+    int batch_size, Callback compute, Callback done);
+
+// Specialization of CheckValidBoxIndex for a CPUDevice.
+template <>
+inline void RunIfBoxIndexIsValid<CPUDevice>(
+    OpKernelContext* context, typename TTypes<int32, 1>::ConstTensor box_index,
+    int batch_size, Callback compute, Callback done) {
+  const int num_boxes = box_index.dimension(0);
+  for (int b = 0; b < num_boxes; ++b) {
+    OP_REQUIRES_ASYNC(
+        context, FastBoundsCheck(box_index(b), batch_size),
+        errors::OutOfRange("box_index has values outside [0, batch_size)"),
+        done);
+  }
+  compute();
+  done();
+}
+
+}  // namespace
 
 template <typename Device, typename T>
-class CropAndResizeOp : public OpKernel {
+class CropAndResizeOp : public AsyncOpKernel {
  public:
-  explicit CropAndResizeOp(OpKernelConstruction* context) : OpKernel(context) {
+  explicit CropAndResizeOp(OpKernelConstruction* context)
+      : AsyncOpKernel(context) {
     string method;
     OP_REQUIRES_OK(context, context->GetAttr("method", &method));
     OP_REQUIRES(context, method == "bilinear",
@@ -80,69 +112,77 @@ class CropAndResizeOp : public OpKernel {
                                              &extrapolation_value_));
   }
 
-  void Compute(OpKernelContext* context) override {
-    // The shape of 'image' is [batch, image_height, image_width, channels].
+  void ComputeAsync(OpKernelContext* context, DoneCallback done) override {
+    // The shape of 'image' is [batch_size, image_height, image_width,
+    // channels].
     const Tensor& image = context->input(0);
-    OP_REQUIRES(context, image.dims() == 4,
-                errors::InvalidArgument("input image must be 4-D",
-                                        image.shape().DebugString()));
-
-    const int batch = image.dim_size(0);
-    const int image_height = image.dim_size(1);
-    const int image_width = image.dim_size(2);
-    const int depth = image.dim_size(3);
-    OP_REQUIRES(context, image_height > 0 && image_width > 0,
-                errors::InvalidArgument("image dimensions must be positive"));
-
     // The shape of 'boxes' is [num_boxes, 4].
     const Tensor& boxes = context->input(1);
-
-    // The shape of 'box_ind' is [num_boxes].
-    const Tensor& box_ind = context->input(2);
-
-    int num_boxes = 0;
-    ParseAndCheckBoxSizes(context, boxes, box_ind, &num_boxes);
-
+    // The shape of 'box_index' is [num_boxes].
+    const Tensor& box_index = context->input(2);
     // The shape of 'crop_size' is [2].
     const Tensor& crop_size = context->input(3);
 
-    OP_REQUIRES(context, crop_size.dims() == 1,
-                errors::InvalidArgument("crop_size must be 1-D",
-                                        crop_size.shape().DebugString()));
-    OP_REQUIRES(context, crop_size.dim_size(0) == 2,
-                errors::InvalidArgument("crop_size must have two elements",
-                                        crop_size.shape().DebugString()));
-
+    // Validate inputs dimensions.
+    OP_REQUIRES_ASYNC(context, image.dims() == 4,
+                      errors::InvalidArgument("input image must be 4-D",
+                                              image.shape().DebugString()),
+                      done);
+    const int batch_size = image.dim_size(0);
+    const int image_height = image.dim_size(1);
+    const int image_width = image.dim_size(2);
+    const int depth = image.dim_size(3);
+    OP_REQUIRES_ASYNC(
+        context, image_height > 0 && image_width > 0,
+        errors::InvalidArgument("image dimensions must be positive"), done);
+    int num_boxes = 0;
+    OP_REQUIRES_OK_ASYNC(
+        context, ParseAndCheckBoxSizes(boxes, box_index, &num_boxes), done);
+
+    OP_REQUIRES_ASYNC(context, crop_size.dims() == 1,
+                      errors::InvalidArgument("crop_size must be 1-D",
+                                              crop_size.shape().DebugString()),
+                      done);
+    OP_REQUIRES_ASYNC(
+        context, crop_size.dim_size(0) == 2,
+        errors::InvalidArgument("crop_size must have two elements",
+                                crop_size.shape().DebugString()),
+        done);
+
+    // Copy and validate crop sizes.
     auto crop_size_vec = crop_size.vec<int32>();
     const int crop_height = internal::SubtleMustCopy(crop_size_vec(0));
     const int crop_width = internal::SubtleMustCopy(crop_size_vec(1));
-    OP_REQUIRES(context, crop_height > 0 && crop_width > 0,
-                errors::InvalidArgument("crop dimensions must be positive"));
+    OP_REQUIRES_ASYNC(
+        context, crop_height > 0 && crop_width > 0,
+        errors::InvalidArgument("crop dimensions must be positive"), done);
 
     // Allocate output tensor.
     Tensor* output = nullptr;
-    OP_REQUIRES_OK(
+    OP_REQUIRES_OK_ASYNC(
         context,
         context->allocate_output(
             0, TensorShape({num_boxes, crop_height, crop_width, depth}),
-            &output));
-
-    typename TTypes<T, 4>::ConstTensor image_data = image.tensor<T, 4>();
-    typename TTypes<float, 2>::ConstTensor boxes_data =
-        boxes.tensor<float, 2>();
-    typename TTypes<int32, 1>::ConstTensor box_ind_data =
-        box_ind.tensor<int32, 1>();
-    typename TTypes<float, 4>::Tensor crops_data = output->tensor<float, 4>();
-
-    CheckValidBoxInd<Device>(context, box_ind_data, batch);
-
-    bool status = functor::CropAndResize<Device, T>()(
-        context->eigen_device<Device>(), image_data, boxes_data, box_ind_data,
-        extrapolation_value_, crops_data);
-    if (!status) {
-      context->SetStatus(
-          errors::Internal("Failed launch CropAndResizeKernel."));
-    }
+            &output),
+        done);
+
+    auto compute_callback = [this, context, output]() {
+      const Tensor& image = context->input(0);
+      const Tensor& boxes = context->input(1);
+      const Tensor& box_index = context->input(2);
+      const bool status = functor::CropAndResize<Device, T>()(
+          context->eigen_device<Device>(), image.tensor<T, 4>(),
+          boxes.tensor<float, 2>(), box_index.tensor<int32, 1>(),
+          extrapolation_value_, output->tensor<float, 4>());
+      if (!status) {
+        context->SetStatus(
+            errors::Internal("Failed launch CropAndResizeKernel."));
+      }
+    };
+
+    RunIfBoxIndexIsValid<Device>(context, box_index.tensor<int32, 1>(),
+                                 batch_size, std::move(compute_callback),
+                                 std::move(done));
   }
 
  private:
@@ -155,10 +195,10 @@ template <typename T>
 struct CropAndResize<CPUDevice, T> {
   bool operator()(const CPUDevice& d, typename TTypes<T, 4>::ConstTensor image,
                   typename TTypes<float, 2>::ConstTensor boxes,
-                  typename TTypes<int32, 1>::ConstTensor box_ind,
+                  typename TTypes<int32, 1>::ConstTensor box_index,
                   float extrapolation_value,
                   typename TTypes<float, 4>::Tensor crops) {
-    const int batch = image.dimension(0);
+    const int batch_size = image.dimension(0);
     const int image_height = image.dimension(1);
     const int image_width = image.dimension(2);
 
@@ -173,8 +213,8 @@ struct CropAndResize<CPUDevice, T> {
       const float y2 = boxes(b, 2);
       const float x2 = boxes(b, 3);
 
-      const int32 b_in = box_ind(b);
-      if (b_in < 0 || b_in >= batch) {
+      const int32 b_in = box_index(b);
+      if (!FastBoundsCheck(b_in, batch_size)) {
         continue;
       }
 
@@ -235,89 +275,94 @@ struct CropAndResize<CPUDevice, T> {
     return true;
   }
 };
+
 }  // namespace functor
 
 template <typename Device, typename T>
-class CropAndResizeGradImageOp : public OpKernel {
+class CropAndResizeGradImageOp : public AsyncOpKernel {
  public:
   explicit CropAndResizeGradImageOp(OpKernelConstruction* context)
-      : OpKernel(context) {
+      : AsyncOpKernel(context) {
     string method;
     OP_REQUIRES_OK(context, context->GetAttr("method", &method));
     OP_REQUIRES(context, method == "bilinear",
                 errors::InvalidArgument("method must be 'bilinear'", method));
   }
 
-  void Compute(OpKernelContext* context) override {
+  void ComputeAsync(OpKernelContext* context, DoneCallback done) override {
     // The shape of 'grads' is [num_boxes, crop_height, crop_width, depth].
     const Tensor& grads = context->input(0);
-
-    OP_REQUIRES(context, grads.dims() == 4,
-                errors::InvalidArgument("grads image must be 4-D",
-                                        grads.shape().DebugString()));
-    const int crop_height = grads.dim_size(1);
-    const int crop_width = grads.dim_size(2);
-    OP_REQUIRES(context, crop_height > 0 && crop_width > 0,
-                errors::InvalidArgument("grads dimensions must be positive"));
-
     // The shape of 'boxes' is [num_boxes, 4].
     const Tensor& boxes = context->input(1);
-
-    // The shape of 'box_ind' is [num_boxes].
-    const Tensor& box_ind = context->input(2);
-
-    int num_boxes = 0;
-    ParseAndCheckBoxSizes(context, boxes, box_ind, &num_boxes);
-
-    OP_REQUIRES(
-        context, grads.dim_size(0) == num_boxes,
-        errors::InvalidArgument("boxes and grads have incompatible shape"));
-
+    // The shape of 'box_index' is [num_boxes].
+    const Tensor& box_index = context->input(2);
     // The shape of 'image_size' is [4].
     const Tensor& image_size = context->input(3);
-    OP_REQUIRES(context, image_size.dims() == 1,
-                errors::InvalidArgument("image_size must be 1-D",
-                                        image_size.shape().DebugString()));
-    OP_REQUIRES(context, image_size.dim_size(0) == 4,
-                errors::InvalidArgument("image_size must have 4 elements",
-                                        image_size.shape().DebugString()));
 
+    // Validate input shapes.
+    OP_REQUIRES_ASYNC(context, grads.dims() == 4,
+                      errors::InvalidArgument("grads image must be 4-D",
+                                              grads.shape().DebugString()),
+                      done);
+    const int crop_height = grads.dim_size(1);
+    const int crop_width = grads.dim_size(2);
+    OP_REQUIRES_ASYNC(
+        context, crop_height > 0 && crop_width > 0,
+        errors::InvalidArgument("grads dimensions must be positive"), done);
+    int num_boxes = 0;
+    OP_REQUIRES_OK_ASYNC(
+        context, ParseAndCheckBoxSizes(boxes, box_index, &num_boxes), done);
+    OP_REQUIRES_ASYNC(
+        context, grads.dim_size(0) == num_boxes,
+        errors::InvalidArgument("boxes and grads have incompatible shape"),
+        done);
+
+    OP_REQUIRES_ASYNC(context, image_size.dims() == 1,
+                      errors::InvalidArgument("image_size must be 1-D",
+                                              image_size.shape().DebugString()),
+                      done);
+    OP_REQUIRES_ASYNC(context, image_size.dim_size(0) == 4,
+                      errors::InvalidArgument("image_size must have 4 elements",
+                                              image_size.shape().DebugString()),
+                      done);
     auto image_size_vec = image_size.vec<int32>();
-    const int batch = internal::SubtleMustCopy(image_size_vec(0));
+    const int batch_size = internal::SubtleMustCopy(image_size_vec(0));
     const int image_height = internal::SubtleMustCopy(image_size_vec(1));
     const int image_width = internal::SubtleMustCopy(image_size_vec(2));
     const int depth = internal::SubtleMustCopy(image_size_vec(3));
-
-    OP_REQUIRES(context, image_height > 0 && image_width > 0,
-                errors::InvalidArgument("image dimensions must be positive"));
-    OP_REQUIRES(
+    OP_REQUIRES_ASYNC(
+        context, image_height > 0 && image_width > 0,
+        errors::InvalidArgument("image dimensions must be positive"), done);
+    OP_REQUIRES_ASYNC(
         context, grads.dim_size(3) == depth,
-        errors::InvalidArgument("image_size and grads are incompatible"));
+        errors::InvalidArgument("image_size and grads are incompatible"), done);
 
     // Allocate output tensor.
     Tensor* output = nullptr;
-    OP_REQUIRES_OK(
-        context, context->allocate_output(
-                     0, TensorShape({batch, image_height, image_width, depth}),
-                     &output));
-
-    typename TTypes<float, 4>::ConstTensor grads_data =
-        grads.tensor<float, 4>();
-    typename TTypes<float, 2>::ConstTensor boxes_data =
-        boxes.tensor<float, 2>();
-    typename TTypes<int32, 1>::ConstTensor box_ind_data =
-        box_ind.tensor<int32, 1>();
-    typename TTypes<T, 4>::Tensor output_data = output->tensor<T, 4>();
-
-    CheckValidBoxInd<Device>(context, box_ind_data, batch);
-
-    bool status = functor::CropAndResizeBackpropImage<Device, T>()(
-        context->eigen_device<Device>(), grads_data, boxes_data, box_ind_data,
-        output_data);
-    if (!status) {
-      context->SetStatus(
-          errors::Internal("Failed launch CropAndResizeBackpropImageKernel."));
-    }
+    OP_REQUIRES_OK_ASYNC(
+        context,
+        context->allocate_output(
+            0, TensorShape({batch_size, image_height, image_width, depth}),
+            &output),
+        done);
+
+    auto compute_callback = [context, output]() {
+      const Tensor& grads = context->input(0);
+      const Tensor& boxes = context->input(1);
+      const Tensor& box_index = context->input(2);
+      const bool status = functor::CropAndResizeBackpropImage<Device, T>()(
+          context->eigen_device<Device>(), grads.tensor<float, 4>(),
+          boxes.tensor<float, 2>(), box_index.tensor<int32, 1>(),
+          output->tensor<T, 4>());
+      if (!status) {
+        context->SetStatus(errors::Internal(
+            "Failed launch CropAndResizeBackpropImage kernel."));
+      }
+    };
+
+    RunIfBoxIndexIsValid<Device>(context, box_index.tensor<int32, 1>(),
+                                 batch_size, std::move(compute_callback),
+                                 std::move(done));
   }
 };
 
@@ -328,9 +373,9 @@ struct CropAndResizeBackpropImage<CPUDevice, T> {
   bool operator()(const CPUDevice& d,
                   typename TTypes<float, 4>::ConstTensor grads,
                   typename TTypes<float, 2>::ConstTensor boxes,
-                  typename TTypes<int32, 1>::ConstTensor box_ind,
+                  typename TTypes<int32, 1>::ConstTensor box_index,
                   typename TTypes<T, 4>::Tensor grads_image) {
-    const int batch = grads_image.dimension(0);
+    const int batch_size = grads_image.dimension(0);
     const int image_height = grads_image.dimension(1);
     const int image_width = grads_image.dimension(2);
 
@@ -347,8 +392,8 @@ struct CropAndResizeBackpropImage<CPUDevice, T> {
       const float y2 = boxes(b, 2);
       const float x2 = boxes(b, 3);
 
-      const int32 b_in = box_ind(b);
-      if (b_in < 0 || b_in >= batch) {
+      const int32 b_in = box_index(b);
+      if (!FastBoundsCheck(b_in, batch_size)) {
         continue;
       }
 
@@ -399,83 +444,90 @@ struct CropAndResizeBackpropImage<CPUDevice, T> {
     return true;
   }
 };
+
 }  // namespace functor
 
 template <typename Device, typename T>
-class CropAndResizeGradBoxesOp : public OpKernel {
+class CropAndResizeGradBoxesOp : public AsyncOpKernel {
  public:
   explicit CropAndResizeGradBoxesOp(OpKernelConstruction* context)
-      : OpKernel(context) {
+      : AsyncOpKernel(context) {
     string method;
     OP_REQUIRES_OK(context, context->GetAttr("method", &method));
     OP_REQUIRES(context, method == "bilinear",
                 errors::InvalidArgument("method must be 'bilinear'", method));
   }
 
-  void Compute(OpKernelContext* context) override {
+  void ComputeAsync(OpKernelContext* context, DoneCallback done) override {
     // The shape of 'grads' is [num_boxes, crop_height, crop_width, depth].
     const Tensor& grads = context->input(0);
+    // The shape of 'boxes' is [num_boxes, 4].
+    const Tensor& boxes = context->input(2);
+    // The shape of 'box_index' is [num_boxes].
+    const Tensor& box_index = context->input(3);
+    // The shape of 'image' is [batch_size, image_height, image_width, depth].
+    const Tensor& image = context->input(1);
 
-    OP_REQUIRES(context, grads.dims() == 4,
-                errors::InvalidArgument("grads image must be 4-D",
-                                        grads.shape().DebugString()));
-
+    // Validate input shapes.
+    OP_REQUIRES_ASYNC(context, grads.dims() == 4,
+                      errors::InvalidArgument("grads image must be 4-D",
+                                              grads.shape().DebugString()),
+                      done);
     const int crop_height = grads.dim_size(1);
     const int crop_width = grads.dim_size(2);
     const int depth = grads.dim_size(3);
-    OP_REQUIRES(context, crop_height > 0 && crop_width > 0,
-                errors::InvalidArgument("grads dimensions must be positive"));
-
-    // The shape of 'image' is [batch, image_height, image_width, depth].
-    const Tensor& image = context->input(1);
-    OP_REQUIRES(context, image.dims() == 4,
-                errors::InvalidArgument("input image must be 4-D",
-                                        image.shape().DebugString()));
-
-    const int batch = image.dim_size(0);
+    OP_REQUIRES_ASYNC(
+        context, crop_height > 0 && crop_width > 0,
+        errors::InvalidArgument("grads dimensions must be positive"), done);
+
+    OP_REQUIRES_ASYNC(context, image.dims() == 4,
+                      errors::InvalidArgument("input image must be 4-D",
+                                              image.shape().DebugString()),
+                      done);
+    const int batch_size = image.dim_size(0);
     const int image_height = image.dim_size(1);
     const int image_width = image.dim_size(2);
-    OP_REQUIRES(context, image_height > 0 && image_width > 0,
-                errors::InvalidArgument("image dimensions must be positive"));
-    OP_REQUIRES(context, image.dim_size(3) == depth,
-                errors::InvalidArgument("image, grads depth differ"));
-
-    // The shape of 'boxes' is [num_boxes, 4].
-    const Tensor& boxes = context->input(2);
-
-    // The shape of 'box_ind' is [num_boxes].
-    const Tensor& box_ind = context->input(3);
+    OP_REQUIRES_ASYNC(
+        context, image_height > 0 && image_width > 0,
+        errors::InvalidArgument("image dimensions must be positive"), done);
+    OP_REQUIRES_ASYNC(context, image.dim_size(3) == depth,
+                      errors::InvalidArgument("image, grads depth differ"),
+                      done);
 
     int num_boxes = 0;
-    ParseAndCheckBoxSizes(context, boxes, box_ind, &num_boxes);
+    OP_REQUIRES_OK_ASYNC(
+        context, ParseAndCheckBoxSizes(boxes, box_index, &num_boxes), done);
 
-    OP_REQUIRES(
+    OP_REQUIRES_ASYNC(
         context, grads.dim_size(0) == num_boxes,
-        errors::InvalidArgument("boxes and grads have incompatible shape"));
+        errors::InvalidArgument("boxes and grads have incompatible shape"),
+        done);
 
     // Allocate output tensor.
     Tensor* output = nullptr;
-    OP_REQUIRES_OK(context, context->allocate_output(
-                                0, TensorShape({num_boxes, 4}), &output));
-
-    typename TTypes<float, 4>::ConstTensor grads_data =
-        grads.tensor<float, 4>();
-    typename TTypes<T, 4>::ConstTensor image_data = image.tensor<T, 4>();
-    typename TTypes<float, 2>::ConstTensor boxes_data =
-        boxes.tensor<float, 2>();
-    typename TTypes<int32, 1>::ConstTensor box_ind_data =
-        box_ind.tensor<int32, 1>();
-    typename TTypes<float, 2>::Tensor output_data = output->tensor<float, 2>();
-
-    CheckValidBoxInd<Device>(context, box_ind_data, batch);
-
-    bool status = functor::CropAndResizeBackpropBoxes<Device, T>()(
-        context->eigen_device<Device>(), grads_data, image_data, boxes_data,
-        box_ind_data, output_data);
-    if (!status) {
-      context->SetStatus(
-          errors::Internal("Failed launch CropAndResizeBackpropBoxesKernel."));
-    }
+    OP_REQUIRES_OK_ASYNC(
+        context,
+        context->allocate_output(0, TensorShape({num_boxes, 4}), &output),
+        done);
+
+    auto compute_callback = [context, output]() {
+      const Tensor& grads = context->input(0);
+      const Tensor& image = context->input(1);
+      const Tensor& boxes = context->input(2);
+      const Tensor& box_index = context->input(3);
+      const bool status = functor::CropAndResizeBackpropBoxes<Device, T>()(
+          context->eigen_device<Device>(), grads.tensor<float, 4>(),
+          image.tensor<T, 4>(), boxes.tensor<float, 2>(),
+          box_index.tensor<int32, 1>(), output->tensor<float, 2>());
+      if (!status) {
+        context->SetStatus(errors::Internal(
+            "Failed launch CropAndResizeBackpropBoxes kernel."));
+      }
+    };
+
+    RunIfBoxIndexIsValid<Device>(context, box_index.tensor<int32, 1>(),
+                                 batch_size, std::move(compute_callback),
+                                 std::move(done));
   }
 };
 
@@ -487,9 +539,9 @@ struct CropAndResizeBackpropBoxes<CPUDevice, T> {
                   typename TTypes<float, 4>::ConstTensor grads,
                   typename TTypes<T, 4>::ConstTensor image,
                   typename TTypes<float, 2>::ConstTensor boxes,
-                  typename TTypes<int32, 1>::ConstTensor box_ind,
+                  typename TTypes<int32, 1>::ConstTensor box_index,
                   typename TTypes<float, 2>::Tensor grads_boxes) {
-    const int batch = image.dimension(0);
+    const int batch_size = image.dimension(0);
     const int image_height = image.dimension(1);
     const int image_width = image.dimension(2);
 
@@ -506,8 +558,8 @@ struct CropAndResizeBackpropBoxes<CPUDevice, T> {
       const float y2 = boxes(b, 2);
       const float x2 = boxes(b, 3);
 
-      const int32 b_in = box_ind(b);
-      if (b_in < 0 || b_in >= batch) {
+      const int32 b_in = box_index(b);
+      if (!FastBoundsCheck(b_in, batch_size)) {
         continue;
       }
 
@@ -589,30 +641,19 @@ struct CropAndResizeBackpropBoxes<CPUDevice, T> {
     return true;
   }
 };
-}  // namespace functor
 
-// Specialization of CheckValidBoxInd for a CPUDevice.
-template <>
-inline void CheckValidBoxInd<CPUDevice>(
-    OpKernelContext* context, typename TTypes<int32, 1>::ConstTensor box_ind,
-    int batch) {
-  const int num_boxes = box_ind.dimension(0);
-  for (int b = 0; b < num_boxes; ++b) {
-    OP_REQUIRES(context, box_ind(b) >= 0 && box_ind(b) < batch,
-                errors::OutOfRange("box_ind has values outside [0, batch)"));
-  }
-}
+}  // namespace functor
 
-#define REGISTER_KERNEL(T)                                         \
-  REGISTER_KERNEL_BUILDER(Name("CropAndResize")                    \
-                              .Device(DEVICE_CPU)                  \
-                              .TypeConstraint<T>("T")              \
-                              .HostMemory("crop_size"),            \
-                          CropAndResizeOp<CPUDevice, T>);          \
-                                                                   \
-  REGISTER_KERNEL_BUILDER(Name("CropAndResizeGradBoxes")           \
-                              .Device(DEVICE_CPU)                  \
-                              .TypeConstraint<T>("T"),             \
+#define REGISTER_KERNEL(T)                                \
+  REGISTER_KERNEL_BUILDER(Name("CropAndResize")           \
+                              .Device(DEVICE_CPU)         \
+                              .TypeConstraint<T>("T")     \
+                              .HostMemory("crop_size"),   \
+                          CropAndResizeOp<CPUDevice, T>); \
+                                                          \
+  REGISTER_KERNEL_BUILDER(Name("CropAndResizeGradBoxes")  \
+                              .Device(DEVICE_CPU)         \
+                              .TypeConstraint<T>("T"),    \
                           CropAndResizeGradBoxesOp<CPUDevice, T>);
 
 TF_CALL_REAL_NUMBER_TYPES(REGISTER_KERNEL);
@@ -634,50 +675,86 @@ TF_CALL_double(REGISTER_KERNEL);
 
 #if GOOGLE_CUDA
 
-// Forward declaration of the CheckValidBoxIndHelper specialization for GPU.
+// Forward declaration of the CheckValidBoxIndexHelper specialization for GPU.
 namespace functor {
 template <>
-void CheckValidBoxIndHelper<GPUDevice>::operator()(
-    const GPUDevice& d, typename TTypes<int32, 1>::ConstTensor box_ind,
-    int batch, typename TTypes<bool, 0>::Tensor isvalid);
-extern template struct CheckValidBoxIndHelper<GPUDevice>;
+void CheckValidBoxIndexHelper<GPUDevice>::operator()(
+    const GPUDevice& d, typename TTypes<int32, 1>::ConstTensor box_index,
+    int batch_size, typename TTypes<bool, 0>::Tensor isvalid);
+extern template struct CheckValidBoxIndexHelper<GPUDevice>;
 }  // namespace functor
 
-// Specialization of CheckValidBoxInd for a GPUDevice.
+namespace {
+
+// Specialization of CheckValidBoxIndex for a GPUDevice.
 template <>
-inline void CheckValidBoxInd<GPUDevice>(
-    OpKernelContext* context, typename TTypes<int32, 1>::ConstTensor box_ind,
-    int batch) {
-  const int num_boxes = box_ind.dimension(0);
+inline void RunIfBoxIndexIsValid<GPUDevice>(
+    OpKernelContext* context, typename TTypes<int32, 1>::ConstTensor box_index,
+    int batch_size, Callback compute, Callback done) {
+  const int num_boxes = box_index.dimension(0);
   if (num_boxes == 0) {
+    compute();
+    done();
     return;
   }
-  Tensor isvalid_tensor;
-  OP_REQUIRES_OK(context,
-                 context->allocate_temp(DataTypeToEnum<bool>::value,
-                                        TensorShape({}), &isvalid_tensor));
 
-  typename TTypes<bool, 0>::Tensor isvalid = isvalid_tensor.tensor<bool, 0>();
+  Tensor isvalid_dev_tensor;
+  OP_REQUIRES_OK_ASYNC(
+      context,
+      context->allocate_temp(DataTypeToEnum<bool>::value, TensorShape({}),
+                             &isvalid_dev_tensor),
+      done);
+  typename TTypes<bool, 0>::Tensor isvalid_dev =
+      isvalid_dev_tensor.tensor<bool, 0>();
 
-  functor::CheckValidBoxIndHelper<GPUDevice>()(
-      context->eigen_device<GPUDevice>(), box_ind, batch, isvalid);
+  // Run the actual box check on the device.
+  functor::CheckValidBoxIndexHelper<GPUDevice>()(
+      context->eigen_device<GPUDevice>(), box_index, batch_size, isvalid_dev);
 
+  // Copy the result back to the host.
   auto* stream = context->op_device_context()->stream();
-  OP_REQUIRES(context, stream, errors::Internal("No GPU stream available."));
-
-  bool isvalid_host = false;
-  perftools::gputools::DeviceMemoryBase isvalid_gpu(isvalid.data(),
-                                                    sizeof(bool));
-  stream->ThenMemcpy(&isvalid_host, isvalid_gpu, sizeof(bool));
-  stream->BlockHostUntilDone();
-
-  OP_REQUIRES(context, stream->ok(),
-              errors::Internal("cudaMemcpy from device to host failed"));
-
-  OP_REQUIRES(context, isvalid_host,
-              errors::OutOfRange("box_ind has values outside [0, batch)"));
+  OP_REQUIRES_ASYNC(context, stream,
+                    errors::Internal("No GPU stream available."), done);
+  Tensor isvalid_host_tensor;
+  // Use pinned host memory on the host to avoid unnecessary
+  // synchronization.
+  AllocatorAttributes alloc_attr;
+  alloc_attr.set_on_host(true);
+  alloc_attr.set_gpu_compatible(true);
+  OP_REQUIRES_OK_ASYNC(
+      context,
+      context->allocate_temp(DataTypeToEnum<bool>::value, TensorShape({}),
+                             &isvalid_host_tensor, alloc_attr),
+      done);
+  typename TTypes<bool, 0>::Tensor isvalid_host =
+      isvalid_host_tensor.tensor<bool, 0>();
+
+  perftools::gputools::DeviceMemoryBase wrapped(isvalid_dev.data(),
+                                                sizeof(bool));
+  const bool status = stream
+                          ->ThenMemcpy(isvalid_host.data() /* destination */,
+                                       wrapped /* source */, sizeof(bool))
+                          .ok();
+  OP_REQUIRES_ASYNC(
+      context, status,
+      errors::Internal("Failed to launch copy of isvalid from device to host."),
+      done);
+
+  auto wrapped_callback = [context, isvalid_host, compute, done]() {
+    OP_REQUIRES_ASYNC(
+        context, isvalid_host(),
+        errors::OutOfRange("box_index has values outside [0, batch_size)"),
+        done);
+    compute();
+    done();
+  };
+
+  context->device()->tensorflow_gpu_device_info()->event_mgr->ThenExecute(
+      stream, wrapped_callback);
 }
 
+}  // namespace
+
 #define REGISTER_KERNEL(T)                                         \
   REGISTER_KERNEL_BUILDER(Name("CropAndResize")                    \
                               .Device(DEVICE_GPU)                  \
diff --git a/tensorflow/core/kernels/crop_and_resize_op.h b/tensorflow/core/kernels/crop_and_resize_op.h
index 22df1bdd56..460dbad22b 100644
--- a/tensorflow/core/kernels/crop_and_resize_op.h
+++ b/tensorflow/core/kernels/crop_and_resize_op.h
@@ -53,12 +53,12 @@ struct CropAndResizeBackpropBoxes {
 };
 
 template <typename Device>
-struct CheckValidBoxIndHelper {
-  // Checks if all values in box_ind are in [0, batch).
+struct CheckValidBoxIndexHelper {
+  // Checks if all values in box_index are in [0, batch).
   void operator()(const Device& d,
-                  typename TTypes<int32, 1>::ConstTensor box_ind, int batch,
+                  typename TTypes<int32, 1>::ConstTensor box_index, int batch,
                   typename TTypes<bool, 0>::Tensor isvalid) {
-    isvalid.device(d) = ((box_ind >= 0) && (box_ind < batch)).all();
+    isvalid.device(d) = ((box_index >= 0) && (box_index < batch)).all();
   }
 };
 
diff --git a/tensorflow/core/kernels/crop_and_resize_op_gpu.cu.cc b/tensorflow/core/kernels/crop_and_resize_op_gpu.cu.cc
index 254475db46..c1235fda89 100644
--- a/tensorflow/core/kernels/crop_and_resize_op_gpu.cu.cc
+++ b/tensorflow/core/kernels/crop_and_resize_op_gpu.cu.cc
@@ -440,7 +440,7 @@ TF_CALL_GPU_NUMBER_TYPES(DEFINE_GPU_SPECS);
 
 #undef DEFINE_GPU_SPECS
 
-template struct CheckValidBoxIndHelper<GPUDevice>;
+template struct CheckValidBoxIndexHelper<GPUDevice>;
 
 }  // namespace functor
 }  // namespace tensorflow
diff --git a/tensorflow/core/kernels/crop_and_resize_op_test.cc b/tensorflow/core/kernels/crop_and_resize_op_test.cc
index 3a7f180598..d6139dae96 100644
--- a/tensorflow/core/kernels/crop_and_resize_op_test.cc
+++ b/tensorflow/core/kernels/crop_and_resize_op_test.cc
@@ -251,7 +251,7 @@ TEST_F(CropAndResizeOpTest, TestInvalidBoxIndexShape) {
   Status s = RunOpKernel();
   ASSERT_FALSE(s.ok());
   EXPECT_TRUE(
-      StringPiece(s.ToString()).contains("box_ind has incompatible shape"))
+      StringPiece(s.ToString()).contains("box_index has incompatible shape"))
       << s;
 }
 
@@ -264,8 +264,10 @@ TEST_F(CropAndResizeOpTest, TestInvalidBoxIndex) {
   Status s = RunOpKernel();
   ASSERT_FALSE(s.ok());
   EXPECT_TRUE(StringPiece(s.ToString())
-                  .contains("box_ind has values outside [0, batch)"))
+                  .contains("box_index has values outside [0, batch_size)"))
       << s;
 }
 
+// TODO(zhengxq, rmlarsen): Add a benchmark.
+
 }  // namespace tensorflow
diff --git a/tensorflow/core/kernels/lookup_table_op.h b/tensorflow/core/kernels/lookup_table_op.h
index 4cd25a3cc6..ff23a09a24 100644
--- a/tensorflow/core/kernels/lookup_table_op.h
+++ b/tensorflow/core/kernels/lookup_table_op.h
@@ -64,8 +64,8 @@ class LookupTableOp : public OpKernel {
         return ctx->status();
       }
       if (ctx->track_allocations()) {
-        ctx->record_device_persistent_memory_allocation(
-            container->MemoryUsed());
+        ctx->record_host_persistent_memory_allocation(
+            container->MemoryUsed() + table_handle_.AllocatedBytes());
       }
       *ret = container;
       return Status::OK();
@@ -225,6 +225,15 @@ class HashTable : public InitializableLookupTable {
     return Status::OK();
   }
 
+  int64 MemoryUsed() const override {
+    if (table_) {
+      const int64 num_elements = table_->size();
+      return num_elements * (sizeof(K) + sizeof(V));
+    } else {
+      return 0;
+    }
+  }
+
  private:
   std::unique_ptr<std::unordered_map<K, V>> table_;
 };
diff --git a/tensorflow/core/ops/data_flow_ops.cc b/tensorflow/core/ops/data_flow_ops.cc
index f82e9d1eb7..f35a1bb648 100644
--- a/tensorflow/core/ops/data_flow_ops.cc
+++ b/tensorflow/core/ops/data_flow_ops.cc
@@ -623,7 +623,17 @@ REGISTER_OP("QueueDequeueV2")
     .Output("components: component_types")
     .Attr("component_types: list(type) >= 1")
     .Attr("timeout_ms: int = -1")
-    .SetShapeFn(shape_inference::UnknownShape)
+    .SetShapeFn([](InferenceContext* c) {
+      if (c->num_outputs() == 1) {
+        c->set_output(0, c->input_handle_shape(0));
+      } else {
+        // TODO(vrv): handle the case of multiple outputs.
+        for (int i = 0; i < c->num_outputs(); ++i) {
+          c->set_output(i, c->UnknownShape());
+        }
+      }
+      return Status::OK();
+    })
     .Doc(R"doc(
 Dequeues a tuple of one or more tensors from the given queue.
 
diff --git a/tensorflow/core/platform/default/build_config/BUILD b/tensorflow/core/platform/default/build_config/BUILD
index 8bc412c5d8..9e3d5f354d 100644
--- a/tensorflow/core/platform/default/build_config/BUILD
+++ b/tensorflow/core/platform/default/build_config/BUILD
@@ -58,6 +58,22 @@ cc_library(
     ],
 )
 
+# Dummy stream executor cuda plugins.
+cc_library(
+    name = "cublas_plugin",
+    srcs = [],
+)
+
+cc_library(
+    name = "cufft_plugin",
+    srcs = [],
+)
+
+cc_library(
+    name = "cudnn_plugin",
+    srcs = [],
+)
+
 # OSX framework for device driver access
 cc_library(
     name = "IOKit",
diff --git a/tensorflow/docs_src/api_guides/python/contrib.graph_editor.md b/tensorflow/docs_src/api_guides/python/contrib.graph_editor.md
index f611624079..de4f126507 100644
--- a/tensorflow/docs_src/api_guides/python/contrib.graph_editor.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.graph_editor.md
@@ -137,16 +137,16 @@ which to operate must always be given explicitly. This is the reason why
 
 ## Module: reroute
 
-*   @{tf.contrib.graph_editor.reroute.swap_ts}
-*   @{tf.contrib.graph_editor.reroute.reroute_ts}
-*   @{tf.contrib.graph_editor.reroute.swap_inputs}
-*   @{tf.contrib.graph_editor.reroute.reroute_inputs}
-*   @{tf.contrib.graph_editor.reroute.swap_outputs}
-*   @{tf.contrib.graph_editor.reroute.reroute_outputs}
-*   @{tf.contrib.graph_editor.reroute.swap_ios}
-*   @{tf.contrib.graph_editor.reroute.reroute_ios}
-*   @{tf.contrib.graph_editor.reroute.remove_control_inputs}
-*   @{tf.contrib.graph_editor.reroute.add_control_inputs}
+*   @{tf.contrib.graph_editor.swap_ts}
+*   @{tf.contrib.graph_editor.reroute_ts}
+*   @{tf.contrib.graph_editor.swap_inputs}
+*   @{tf.contrib.graph_editor.reroute_inputs}
+*   @{tf.contrib.graph_editor.swap_outputs}
+*   @{tf.contrib.graph_editor.reroute_outputs}
+*   @{tf.contrib.graph_editor.swap_ios}
+*   @{tf.contrib.graph_editor.reroute_ios}
+*   @{tf.contrib.graph_editor.remove_control_inputs}
+*   @{tf.contrib.graph_editor.add_control_inputs}
 
 ## Module: edit
 
diff --git a/tensorflow/docs_src/api_guides/python/contrib.linalg.md b/tensorflow/docs_src/api_guides/python/contrib.linalg.md
index efc2d76ef1..b2c7fcf6bb 100644
--- a/tensorflow/docs_src/api_guides/python/contrib.linalg.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.linalg.md
@@ -21,7 +21,7 @@ Subclasses of `LinearOperator` provide a access to common methods on a
 *   @{tf.contrib.linalg.LinearOperatorDiag}
 *   @{tf.contrib.linalg.LinearOperatorIdentity}
 *   @{tf.contrib.linalg.LinearOperatorScaledIdentity}
-*   @{tf.contrib.linalg.LinearOperatorMatrix}
+*   @{tf.contrib.linalg.LinearOperatorFullMatrix}
 *   @{tf.contrib.linalg.LinearOperatorTriL}
 *   @{tf.contrib.linalg.LinearOperatorUDVHUpdate}
 
diff --git a/tensorflow/docs_src/api_guides/python/contrib.losses.md b/tensorflow/docs_src/api_guides/python/contrib.losses.md
index cb93f9d549..8c289dd556 100644
--- a/tensorflow/docs_src/api_guides/python/contrib.losses.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.losses.md
@@ -13,8 +13,8 @@ of samples in the batch and `d1` ... `dN` are the remaining dimensions.
 It is common, when training with multiple loss functions, to adjust the relative
 strengths of individual losses. This is performed by rescaling the losses via
 a `weight` parameter passed to the loss functions. For example, if we were
-training with both log_loss and sum_of_squares_loss, and we wished that the
-log_loss penalty be twice as severe as the sum_of_squares_loss, we would
+training with both log_loss and mean_square_error, and we wished that the
+log_loss penalty be twice as severe as the mean_square_error, we would
 implement this as:
 
 ```python
@@ -22,7 +22,7 @@ implement this as:
   tf.contrib.losses.log(predictions, labels, weight=2.0)
 
   # Uses default weight of 1.0
-  tf.contrib.losses.sum_of_squares(predictions, labels)
+  tf.contrib.losses.mean_square_error(predictions, labels)
 
   # All the losses are collected into the `GraphKeys.LOSSES` collection.
   losses = tf.get_collection(tf.GraphKeys.LOSSES)
@@ -74,7 +74,7 @@ these predictions.
   predictions = MyModelPredictions(images)
 
   weight = tf.cast(tf.greater(depths, 0), tf.float32)
-  loss  = tf.contrib.losses.sum_of_squares(predictions, depths, weight)
+  loss  = tf.contrib.losses.mean_square_error(predictions, depths, weight)
 ```
 
 Note that when using weights for the losses, the final average is computed
@@ -100,7 +100,7 @@ weighted average over the individual prediction errors:
 
   weight = MyComplicatedWeightingFunction(labels)
   weight = tf.div(weight, tf.size(weight))
-  loss = tf.contrib.losses.sum_of_squares(predictions, depths, weight)
+  loss = tf.contrib.losses.mean_square_error(predictions, depths, weight)
 ```
 
 @{tf.contrib.losses.absolute_difference}
@@ -118,9 +118,4 @@ weighted average over the individual prediction errors:
 @{tf.contrib.losses.softmax_cross_entropy}
 @{tf.contrib.losses.sparse_softmax_cross_entropy}
 
-The following are deprecated in favor of `mean_pairwise_squared_error` and
-`mean_squared_error`.
-@{tf.contrib.losses.sum_of_pairwise_squares}
-@{tf.contrib.losses.sum_of_squares}
-
 
diff --git a/tensorflow/docs_src/get_started/tflearn.md b/tensorflow/docs_src/get_started/tflearn.md
index 079349be32..ed21969b3e 100644
--- a/tensorflow/docs_src/get_started/tflearn.md
+++ b/tensorflow/docs_src/get_started/tflearn.md
@@ -278,7 +278,7 @@ Then, the code creates a `DNNClassifier` model using the following arguments:
 
 The `tf.contrib.learn` API uses input functions, which create the TensorFlow
 operations that generate data for the model. In this case, the data is small
-enough that it can be stored in @{tf.constant TensorFlow constants}. The
+enough that it can be stored in @{tf.constant$TensorFlow constants}. The
 following code produces the simplest possible input pipeline:
 
 ```python
diff --git a/tensorflow/docs_src/install/install_java.md b/tensorflow/docs_src/install/install_java.md
index 111b046689..a20fccffd5 100644
--- a/tensorflow/docs_src/install/install_java.md
+++ b/tensorflow/docs_src/install/install_java.md
@@ -211,15 +211,20 @@ two files are available to the JVM:
   * the downloaded `.jar` file
   * the extracted JNI library
 
-For example, the following command line executes the `HelloTF` program:
+For example, the following command line executes the `HelloTF` program on Linux
+and Mac OS X:
 
 <pre><b>java -cp libtensorflow-1.1.0.jar:. -Djava.library.path=./jni HelloTF</b></pre>
 
+And the following comand line executes the `HelloTF` program on Windows:
+
+<pre><b>java -cp libtensorflow-1.1.0-rc2.jar;. -Djava.library.path=jni HelloTF</b></pre>
+
 If the program prints <tt>Hello from <i>version</i></tt>, you've successfully
 installed TensorFlow for Java and are ready to use the API.  If the program
 outputs something else, check
-[Stack Overflow](http://stackoverflow.com/questions/tagged/tensorflow)
-for possible solutions.
+[Stack Overflow](http://stackoverflow.com/questions/tagged/tensorflow) for
+possible solutions.
 
 
 ### Advanced Example
diff --git a/tensorflow/docs_src/performance/benchmarks.md b/tensorflow/docs_src/performance/benchmarks.md
index 8c0cff138d..bfb47d9f90 100644
--- a/tensorflow/docs_src/performance/benchmarks.md
+++ b/tensorflow/docs_src/performance/benchmarks.md
@@ -1,17 +1,17 @@
-# TensorFlow Performance Benchmarks
+# Benchmarks
 
 ## Overview
 
 A selection of image classification models were tested across multiple platforms
 to create a point of reference for the TensorFlow community. The methodology,
-links to the scripts, and commands to reproduce the results are in the
-[appendix](#appendix).
+links to the benchmark scripts, and commands to reproduce the results are in the
+[Appendix](#appendix).
 
 ## Results for image classification models
 
-InceptionV3 ([arXiv:1512.00567](https://arxiv.org/abs/1512.00567)),
-ResNet-50 ([arXiv:1512.03385](https://arxiv.org/abs/1512.03385)),
-ResNet-152 ([arXiv:1512.03385](https://arxiv.org/abs/1512.03385)), VGG16
+InceptionV3 ([arXiv:1512.00567](https://arxiv.org/abs/1512.00567)), ResNet-50
+([arXiv:1512.03385](https://arxiv.org/abs/1512.03385)), ResNet-152
+([arXiv:1512.03385](https://arxiv.org/abs/1512.03385)), VGG16
 ([arXiv:1409.1556](https://arxiv.org/abs/1409.1556)), and
 [AlexNet](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf)
 were tested using the [ImageNet](http://www.image-net.org/) data set. Tests were
@@ -27,32 +27,32 @@ input pipeline and the underlying disk I/O are saturating the compute units.
 
 ### Training with NVIDIA® DGX-1™ (NVIDIA® Tesla® P100)
 
-<div style="width:100%; margin:auto; margin-bottom:10px; margin-top:20px;">
-  <img style="width:100%" src="../images/perf_summary_p100_single_server.png">
+<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
+  <img style="width:80%" src="../images/perf_summary_p100_single_server.png">
 </div>
 
 Details and additional results are in the [Details for NVIDIA® DGX-1™ (NVIDIA®
-Tesla® P100)](#details-for-nvidia®-dgx-1™-nvidia®-tesla®-p100) section.
+Tesla® P100)](#details_for_nvidia_dgx-1tm_nvidia_tesla_p100) section.
 
 ### Training with NVIDIA® Tesla® K80
 
 <div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
-  <img style="width:100%" src="../images/perf_summary_k80_single_server.png">
+  <img style="width:80%" src="../images/perf_summary_k80_single_server.png">
 </div>
 
 Details and additional results are in the [Details for Google Compute Engine
-(NVIDIA® Tesla® K80)](#details-for-google-compute-engine-nvidia®-tesla®-k80) and
+(NVIDIA® Tesla® K80)](#details_for_google_compute_engine_nvidia_tesla_k80) and
 [Details for Amazon EC2 (NVIDIA® Tesla®
-K80)](#details-for-amazon-ec2-nvidia®-tesla®-k80) sections.
+K80)](#details_for_amazon_ec2_nvidia_tesla_k80) sections.
 
 ### Distributed training with NVIDIA® Tesla® K80
 
 <div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
-  <img style="width:100%" src="../images/perf_summary_k80_aws_distributed.png">
+  <img style="width:80%" src="../images/perf_summary_k80_aws_distributed.png">
 </div>
 
 Details and additional results are in the [Details for Amazon EC2 Distributed
-(NVIDIA® Tesla® K80)](#details-for-amazon-ec2-distributed-nvidia®-tesla®-k80)
+(NVIDIA® Tesla® K80)](#details_for_amazon_ec2_distributed_nvidia_tesla_k80)
 section.
 
 ### Compare synthetic with real data training
@@ -82,12 +82,15 @@ section.
 *   **TensorFlow GitHub hash:** b1e174e
 *   **Build Command:** `bazel build -c opt --copt=-march="haswell" --config=cuda
     //tensorflow/tools/pip_package:build_pip_package`
-*   **Disk:** local SSD
+*   **Disk:** Local SSD
 *   **DataSet:** ImageNet
 
-Batch size and optimizer used for each model.
+Batch size and optimizer used for each model are listed in the table below. In
+addition to the batch sizes listed in the table, InceptionV3, ResNet-50,
+ResNet-152, and VGG16 were tested with a batch size of 32. Those results are in
+the *other results* section.
 
-                   | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16
+Options            | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16
 ------------------ | ----------- | --------- | ---------- | ------- | -----
 Batch size per GPU | 64          | 64        | 64         | 512     | 64
 Optimizer          | sgd         | sgd       | sgd        | sgd     | sgd
@@ -104,10 +107,8 @@ VGG16       | replicated (with NCCL) | n/a
 
 ### Results
 
-Batch size and optimizer used for each model are listed in the table below.
-
 <div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
-  <img style="width:100%" src="../images/perf_summary_p100_single_server.png">
+  <img style="width:80%" src="../images/perf_summary_p100_single_server.png">
 </div>
 
 <div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
@@ -136,6 +137,28 @@ GPUs | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16
 Training AlexNet with real data on 8 GPUs was excluded from the graph and table
 above due to it maxing out the input pipeline.
 
+### Other Results
+
+The results below are all with a batch size of 32.
+
+**Training synthetic data**
+
+GPUs | InceptionV3 | ResNet-50 | ResNet-152 | VGG16
+---- | ----------- | --------- | ---------- | -----
+1    | 128         | 210       | 85.3       | 124
+2    | 259         | 412       | 166        | 241
+4    | 520         | 827       | 330        | 470
+8    | 995         | 1623      | 643        | 738
+
+**Training real data**
+
+GPUs | InceptionV3 | ResNet-50 | ResNet-152 | VGG16
+---- | ----------- | --------- | ---------- | -----
+1    | 130         | 208       | 85.0       | 124
+2    | 257         | 403       | 163        | 221
+4    | 507         | 814       | 325        | 401
+8    | 966         | 1525      | 641        | 619
+
 ## Details for Google Compute Engine (NVIDIA® Tesla® K80)
 
 ### Environment
@@ -156,7 +179,7 @@ addition to the batch sizes listed in the table, InceptionV3 and ResNet-50 were
 tested with a batch size of 32. Those results are in the *other results*
 section.
 
-                   | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16
+Options            | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16
 ------------------ | ----------- | --------- | ---------- | ------- | -----
 Batch size per GPU | 64          | 64        | 32         | 512     | 32
 Optimizer          | sgd         | sgd       | sgd        | sgd     | sgd
@@ -184,10 +207,10 @@ GPUs | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16
 
 GPUs | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16
 ---- | ----------- | --------- | ---------- | ------- | -----
-1    | 30.5        | 56.7      | 20.7       | 639     | 30.2
-2    | 57.8        | 107       | 39         | 1136    | 55.5
-4    | 115         | 211       | 77.3       | 2067    | 106
-8    | 225         | 418       | 150        | 4056    | 213
+  1  | 30.6        | 56.7      | 20.7       | 639     | 30.2       
+  2  | 58.4        | 107       | 39.0       | 1136    | 55.5       
+  4  | 115         | 211       | 77.3       | 2067    | 106        
+  8  | 225         | 422       | 151        | 4056    | 213   
 
 ### Other Results
 
@@ -204,10 +227,10 @@ GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32)
 
 GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32)
 ---- | --------------------------- | -------------------------
-1    | 29.3                        | 53.6
-2    | 55                          | 102
-4    | 109                         | 200
-8    | 215                         | 387
+  1  | 29.5                        | 53.6       
+  2  | 55.4                        | 102        
+  4  | 110                         | 201        
+  8  | 216                         | 387  
 
 ## Details for Amazon EC2 (NVIDIA® Tesla® K80)
 
@@ -230,7 +253,7 @@ addition to the batch sizes listed in the table, InceptionV3 and ResNet-50 were
 tested with a batch size of 32. Those results are in the *other results*
 section.
 
-                   | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16
+Options            | InceptionV3 | ResNet-50 | ResNet-152 | Alexnet | VGG16
 ------------------ | ----------- | --------- | ---------- | ------- | -----
 Batch size per GPU | 64          | 64        | 32         | 512     | 32
 Optimizer          | sgd         | sgd       | sgd        | sgd     | sgd
@@ -289,7 +312,7 @@ GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32)
 GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32)
 ---- | --------------------------- | -------------------------
 1    | 30.0                        | 53.6
-2    | 57.5                        | 101
+2    | 57.5                        | 102
 4    | 113                         | 202
 8    | 212                         | 379
 
@@ -313,7 +336,7 @@ addition to the batch sizes listed in the table, InceptionV3 and ResNet-50 were
 tested with a batch size of 32. Those results are in the *other results*
 section.
 
-                   | InceptionV3 | ResNet-50 | ResNet-152
+Options            | InceptionV3 | ResNet-50 | ResNet-152
 ------------------ | ----------- | --------- | ----------
 Batch size per GPU | 64          | 64        | 32
 Optimizer          | sgd         | sgd       | sgd
@@ -337,7 +360,7 @@ used with the following exceptions:
 ### Results
 
 <div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
-  <img style="width:95%" src="../images/perf_summary_k80_aws_distributed.png">
+  <img style="width:80%" src="../images/perf_summary_k80_aws_distributed.png">
 </div>
 
 <div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
@@ -374,34 +397,37 @@ GPUs | InceptionV3 (batch size 32) | ResNet-50 (batch size 32)
 
 ### Executing benchmark tests
 
-The code for the benchmarks was created to be both used for benchmarking
-TensorFlow as well as used as a tool to test hardware platforms. The benchmark
-code includes modes such as `trivial` that run a virtually empty model that is
-useful for testing the maximum possibly samples/sec for the input pipeline among
-other things. Not only does this test TensorFlow but also the throughput of the
-underlying systems. There are two ways to execute the benchmarks in
-[tf_cnn_benchmarks.py](TODO: LINK TO GITHUB):
+The [benchmark code](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks)
+was created to be used for benchmarking TensorFlow as well as used as a tool to
+test hardware platforms. Techniques used in the benchmark scripts are detailed
+in @{$performance_models$High-Performance Models}.
+
+There are two ways to execute the benchmark code:
 
-1.  Execute [tf_cnn_benchmarks.py](TODO: LINK TO GITHUB) directly
-2.  Utilize the [small wrapper](TODO: LINK TO GITHUB) that helps pick the
-    correct config
+1.  Execute [tf_cnn_benchmarks.py](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py)
+    directly.
+2.  Utilize the [scripts](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks/main.py)
+    that helps pick the correct config for each platform executes
+    `tf_cnn_benchmarks.py`.
 
 The wrapper is suggested as a starting point. Then investigate the variety of
-options available in `tf_cnn_benchmarks.py`. While the wrapper extensive
-examples, below are a couple highlights.
+options available in `tf_cnn_benchmarks.py`. Below are a couple examples of
+using the wrapper.
 
-Run ResNet-50 on a single instance with 8 GPUs. The `system` argument is used to
-determine the optimal configuration. The supported values are gce, aws, and
-dgx1. If `system` is not passeed, the best config for the most widely available
-hardware is used.
+**Single Server**
+This example illustrates training ResNet-50 on a single instance with 8 GPUs.
+The `system` flag is used to determine the optimal configuration. The
+supported values are gce, aws, and dgx1. If `system` is not passed, the best
+config for the most widely available hardware is used.
 
 ```bash
 python main.py --model=resnet50 --num_gpus=8
 python main.py --system=aws --model=resnet50 --num_gpus=8
 ```
 
-Run ResNet-50 on 2 hosts, e.g. host_0 (10.0.0.1) and host_1 (10.0.0.2), with 8
-GPUs each on aws.
+**Distributed**
+This example illustrates training ResNet-50 on 2 hosts, e.g. host_0 (10.0.0.1)
+and host_1 (10.0.0.2), with 8 GPUs each on AWS (Amazon EC2).
 
 ```bash
 # Run the following commands on host_0 (10.0.0.1):
diff --git a/tensorflow/docs_src/performance/index.md b/tensorflow/docs_src/performance/index.md
index 0ff4d2ee00..746dc0c74f 100644
--- a/tensorflow/docs_src/performance/index.md
+++ b/tensorflow/docs_src/performance/index.md
@@ -2,11 +2,19 @@
 
 Performance is often a significant issue when training a machine learning
 model.  This section explains various ways to optimize performance.  Start
-your investigation with the following guide:
+your investigation with the @{$performance_guide$Performance Guide} and then go
+deeper with techniques detailed in @{$performance_models$High-Performance Models}:
 
-  * @{$performance_guide$Performance}, which contains a collection of best
+  * @{$performance_guide$Performance Guide}, which contains a collection of best
     practices for optimizing your TensorFlow code.
 
+  * @{$performance_models$High-Performance Models}, which contains a collection
+    advanced techniques to build highly scalable models targeting different
+    system types and network topologies.
+
+  * @{$benchmarks$Benchmarks}, which contains a collection of benchmark
+    results.
+
 XLA (Accelerated Linear Algebra) is an experimental compiler for linear
 algebra that optimizes TensorFlow computations. The following guides explore
 XLA:
diff --git a/tensorflow/docs_src/performance/leftnav_files b/tensorflow/docs_src/performance/leftnav_files
index 0f30cc7fa5..d228473220 100644
--- a/tensorflow/docs_src/performance/leftnav_files
+++ b/tensorflow/docs_src/performance/leftnav_files
@@ -1,4 +1,8 @@
 performance_guide.md
+performance_models.md
+benchmarks.md
+quantization.md
+>>>
 xla/index.md
 xla/broadcasting.md
 xla/developing_new_backend.md
@@ -6,4 +10,3 @@ xla/jit.md
 xla/operation_semantics.md
 xla/shapes.md
 xla/tfcompile.md
-quantization.md
diff --git a/tensorflow/docs_src/performance/performance_guide.md b/tensorflow/docs_src/performance/performance_guide.md
index 8a1bba883a..07c5d3087f 100644
--- a/tensorflow/docs_src/performance/performance_guide.md
+++ b/tensorflow/docs_src/performance/performance_guide.md
@@ -1,8 +1,10 @@
-# Performance
+# Performance Guide
 
 This guide contains a collection of best practices for optimizing your
 TensorFlow code. The best practices apply to both new and experienced
-Tensorflow users.
+Tensorflow users.  As a complement to the best practices in this document, the
+@{$performance_models$High-Performance Models} document links to example code
+and details for creating models that scale on a variety of hardware.
 
 ## Best Practices
 While optimizing implementations of different types of models can be different,
@@ -73,7 +75,7 @@ Unless for a special circumstance or for example code, do not feed data
 into the session from Python variables, e.g. `dictionary`.
 
 ```python
-# This will result in poor performance.
+# Using feed_dict often results in suboptimal performance when using large inputs.
 sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
 ```
 
@@ -141,3 +143,4 @@ bn = tf.contrib.layers.batch_norm(
 The non-fused batch norm does computations using several individual Ops. Fused
 batch norm combines the individual operations into a single kernel, which runs
 faster.
+
diff --git a/tensorflow/docs_src/performance/performance_models.md b/tensorflow/docs_src/performance/performance_models.md
index 71c4e6cfe0..70c415a024 100644
--- a/tensorflow/docs_src/performance/performance_models.md
+++ b/tensorflow/docs_src/performance/performance_models.md
@@ -1,155 +1,109 @@
 # High-Performance Models
 
-TensorFlow is a powerful and flexible machine learning platform.
-It can be used to distribute model training and inference across a large number
-of machines and computation devices.
-
-Its software stack is made of a few layers:
-
-* a fast and powerful C++ core
-* low-level Python primitives that sit right above individual kernels
-* a diverse range of high-level libraries that aim to make building real models
-  easier
-
-There are many existing examples and tutorials that explain useful features in
-TensorFlow.  The goal of this set of scripts is to demonstrate that we can build
-flexible and powerful high-performance models using the low-level APIs.
-In the future, many of the high-performance primitives will be incorporated into
-high-level APIs, and made available to more users transparently.
-But meanwhile, we show that it is fairly easy for advanced users to build highly
-scalable models targeting different system types, network topologies, etc.
-
-We divide our effort to build high-performance models into three categories:
-
-1. A fast input pipeline to read data from disk, preprocess it, and make it
-   ready on the GPU.
-2. A high-throughput model that trains on GPU very efficiently.
-3. Fast variable and gradients distribution mechanisms that scale well across
-   many machines and computation devices.
+This document and accompanying
+[scripts](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks)
+detail how to build highly scalable models that target a variety of system types
+and network topologies. The techniques in this document utilize some low-level
+TensorFlow Python primitives. In the future, many of these techniques will be
+incorporated into high-level APIs.
 
 ## Input Pipeline
 
-The input pipeline is the part of a TensorFlow program that reads input data,
-shuffles it, and preprocesses it.
-
-Among the most important features to build a fast input pipeline:
-
-* Avoid using feed-dictionary to feed a large amount of data for each step.
-  * Instead, use reader ops to get data into TensorFlow directly.
-* Parallelize data processing.
-* Use software pipelining to feed data, so that data is available immediately
-  when needed.
-
-One way to implement software pipelining in TensorFlow is through
-`tf.FifoQueue`, and it is possible to parallelize data processing through
-`tf.train.queue_runner`, which uses Python threads as its underlying
-implementation.
-This lays the foundation for the current Inception input pipeline.
-This design is well built for feeding older generation of GPUs,
-but the overhead of Python threads is too large to feed newer GPUs that are four
-to five times faster.
-
-In this model, we explore an alternative design that uses the native
-parallelism in TensorFlow.  In our example of an image model input pipeline,
-there are a few important parts:
-
-* Choose and read the image files from the disk.
-* Decode the image data into images, transform and add distortion so they are
-ready to be used.
-* Organize the transformed images into a minibatch.
-* Transfer the images from CPU to GPU, so they are ready for model training.
-
-It is important to note that the dominant part of each stage can happen in
-parallel with that of other stages:
-the file IO uses DMA to transfer the data from hard disk to memory;
-image decoding, transformation and distortion are CPU-heavy;
-the data transfer from CPU to GPU uses the GPU's copy-engine unit;
-and the GPU kernels use the main SMs of the GPU.
-It is natural to cut our pipeline into those parts so they can run in parallel
-with each other.
-
-Also, as mentioned earlier, most of the current input pipeline heavily uses
-Python threads.  However, the large overhead introduced by Python threads
-severely limits its scalability when the newer GPUs are a lot faster; we can
-alleviate this by making a single `session.run` call execute all parts of the
-pipeline.
-
-### Parallelize IO Reads
-
-In this new model, we use the native parallelism in TensorFlow: TensorFlow
-subscribes to an eager-execution model, which means that when nodes in the graph
-became available, TensorFlow will try to execute as many of them as possible.
-
-In order to parallelize reading from hard disk, we use `data_flow_ops.RecordInput`
-in this model.
-Given a list of input files of TFRecords, `RecordInput` continuously reads
-records using background threads, placing the records into its own large,
-internal pool of records.
-When it is has loaded at least half of its capacity, it produces output tensors.
-
-Since this op has its internal threads, and is dominated by IO time that doesn’t
-consume much CPU time, it naturally runs in parallel with the rest of the model.
+The @{$performance_guide$Performance Guide} explains how to identify possible
+input pipeline issues and best practices. We found that using @{tf.FIFOQueue}
+and @{tf.train.queue_runner} could not saturate multiple current generation GPUs
+when using large inputs and processing with higher samples per second, such
+as training ImageNet with [AlexNet](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf).
+This is due to the the use of Python threads as its underlying implementation.
+The overhead of Python threads is too large.
+
+Another approach, which we have implemented in the
+[scripts](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks),
+is to build an input pipeline using the native parallelism in TensorFlow. Our
+implementation is made up of 3 stages:
+
+*   I/O reads: Choose and read image files from disk.
+*   Image Processing: Decode image records into images, preprocess, and organize
+    into mini-batches.
+*   CPU-to-GPU Data Transfer: Transfer images from CPU to GPU.
+
+The dominant part of each stage is executed in parallel with the other stages
+using `data_flow_ops.StagingArea`. `StagingArea` is a queue-like operator
+similar to @{tf.FIFOQueue}. The difference is that `StagingArea` offers simpler
+functionality and can be executed on both CPU and GPU in parallel with other
+stages. Breaking the input pipeline into 3 stages that operate independently in
+parallel is scalable and takes full advantage of large multi-core environments.
+The rest of this section details the stages followed by details about using
+`data_flow_ops.StagingArea`.
+
+### Parallelize I/O Reads
+
+`data_flow_ops.RecordInput` is used to parallelize reading from disk. Given a
+list of input files representing TFRecords, `RecordInput` continuously reads
+records using background threads. The records are placed into its own large
+internal pool and when it has loaded at least half of its capacity, it produces
+output tensors.
+
+This op has its own internal threads that are dominated by I/O time that consume
+minimal CPU, which allows it to run smoothly in parallel with the rest of the
+model.
 
 ### Parallelize Image Processing
 
-After reading from “RecordInput”, the tensors are passed to the input processing
-pipeline.  For example, if we need to feed 8 GPUs, each with a batch-size of 32,
-then for each step we do the following.
-
-First, read 32x8=256 records, and process them individually, in
-parallel. This starts with 256 independent RecordInput read ops in the graph.
-
-Then, follow each read with identical set of ops for processing. Each set is
-considered independent and will execute in parallel.  The operations include
-image decoding, image distortion, and resizing.
-
-Finally, once the images are ready, they will be concatenated together into 8
-batch-size 32 tensors.
-Note that we can use “tf.concat” for this purpose.
-However, “tf.concat” is implemented as a single op, which waits for all
-the inputs to be ready, and then concatenates them together. Since all
-inputs are produced in parallel, there will be a long tail waiting for all
-inputs to be available; and when concatenation happens, the op becomes memory
-limited as all input tensors compete for memory bandwidth.
-So for the final concatenation, we use `tf.parallel_stack` instead. This
+After images are read from `RecordInput` they are passed as tensors to the image
+processing pipeline. To make the image processing pipeline easier to explain,
+assume that the input pipeline is targeting 8 GPUs with a batch size of 256 (32
+per GPU).
+
+256 records are read and processed individually in parallel. This starts with
+256 independent `RecordInput` read ops in the graph. Each read op is followed by
+an identical set of ops for image preprocessing that are considered independent
+and executed in parallel. The image preprocessing ops include operations such as
+image decoding, distortion, and resizing.
+
+Once the images are through preprocessing, they are concatenated together into 8
+batch size 32 tensors. Rather than use @{tf.concat} for this purpose, which is
+implemented as a single op that waits for all the inputs to be ready before
+concatenating them together, @{tf.parallel_stack} is used. @{tf.parallel_stack}
 allocates an uninitialized tensor as an output, and each input tensor is written
 to its designated portion of the output tensor as soon as the input is
-available.  When all the input tensors are finished, the output tensor is passed
-along in the graph. This effectively hides all the memory latency with the long
-tail of producing all the input tensors.
+available.
+
+When all the input tensors are finished, the output tensor is passed along in
+the graph. This effectively hides all the memory latency with the long tail of
+producing all the input tensors.
 
 ### Parallelize CPU-to-GPU Data Transfer
 
-In our example, once all the input images are processed and concatenated
-together by the CPU, we have 8 tensors, each of which has a batch-size of 32.
-These tensors are then to be used by the GPU for the model training.
+Continuing with the assumption that the target is 8 GPUs with a batch size of
+256 (32 per GPU). Once the input images are processed and concatenated together
+by the CPU, we have 8 tensors each with a batch-size of 32.
 
-In TensorFlow, users can use tensors from one device on any other device
-directly.  TensorFlow inserts implicit copies to make the tensors available on
-any devices where they are used.  The runtime schedules the copy between devices
-to run before the tensors are actually used.  However, if the copy cannot finish
-in time, the computation that needs those tensors will stall.
+TensorFlow enables tensors from one device to be used on any other device
+directly. TensorFlow inserts implicit copies to make the tensors available on
+any devices where they are used. The runtime schedules the copy between devices
+to run before the tensors are actually used. However, if the copy cannot finish
+in time, the computation that needs those tensors will stall and result in
+decreased performance.
 
-For high-performance models, it is helpful to explicitly schedule the copy ahead
-of the time in parallel, so when the computation starts on GPU, all the tensors
-are already available on the right device.
+In this implementation, `data_flow_ops.StagingArea` is used to explicitly
+schedule the copy in parallel. The end result is that when computation starts on
+the GPU, all the tensors are already available.
 
 ### Software Pipelining
 
-With all the stages capable of being driven by different processors, we insert
-`data_flow_ops.StagingArea` in between them so they run in parallel.
-`StagingArea` is a queue-like operator similar to `tf.FifoQueue`.
-But it offers simpler functionalities and can be executed on both CPU and GPU.
+With all the stages capable of being driven by different processors,
+`data_flow_ops.StagingArea` is used between them so they run in parallel.
+`StagingArea` is a queue-like operator similar to @{tf.FIFOQueue} that offers
+simpler functionalities that can be executed on both CPU and GPU.
 
-Before the model starts running all the stages, we warm up the stages in order
-so the staging buffers in between all have one set of data in them.
-During each run step that follows, we will run all the stages.
-They read one set of data from the staging buffers at the beginning of each
-stage, and push one set at end end.
+Before the model starts running all the stages, the input pipeline stages are
+warmed up to prime the staging buffers in between with one set of data.
+During each run step, one set of data is read from the staging buffers at
+the beginning of each stage, and one set is pushed at the end.
 
-For example: if there are three stages: A, B and C.
-There are two staging areas in between: S1 and S2.
-During the warm up, we run:
+For example: if there are three stages: A, B and C. There are two staging areas
+in between: S1 and S2. During the warm up, we run:
 
 ```
 Warm up:
@@ -162,123 +116,126 @@ Step 4: A3  B2  C1
 Step 5: A4  B3  C2
 ```
 
-After the warm up, S1 and S2 each have one set of data in them.
-For each step of the actual execution, one set of data is consumed from each
-staging area, and one set is added to each.
+After the warm up, S1 and S2 each have one set of data in them. For each step of
+the actual execution, one set of data is consumed from each staging area, and
+one set is added to each.
 
-There are a few nice properties about the scheme:
+Benefits of using this scheme:
 
-* All the stages are non-blocking, since the staging areas always have one set
-of data after the warm up.
-* Each stage can run in parallel since they can all start immediately.
-* The staging buffers have a fixed memory overhead. They will have at most one
-  extra set of data.
-* Only a single`session.run()` call is needed to run all stages of the step,
-  which makes profiling and debugging much easier.
+*   All stages are non-blocking, since the staging areas always have one set of
+    data after the warm up.
+*   Each stage can run in parallel since they can all start immediately.
+*   The staging buffers have a fixed memory overhead. They will have at most one
+    extra set of data.
+*   Only a single`session.run()` call is needed to run all stages of the step,
+    which makes profiling and debugging much easier.
 
 ## Best Practices in Building High-Performance Models
 
-The computation on GPU can happen immediately since the input data have already
-been transferred onto GPU when the step starts.
-But it is still important to build the model that runs as fast as possible.
-Here are some tips for a high-performance convolutional neural network (CNN)
-model:
+Collected below are a couple of additional best practices that can improve
+performance and increase the flexiblity of models.
 
 ### Build the model with both NHWC and NCHW
 
 Most TensorFlow operations used by a CNN support both NHWC and NCHW data format.
-On GPU, NCHW is faster.
-But on CPU, NHWC is sometimes faster.
+On GPU, NCHW is faster. But on CPU, NHWC is sometimes faster.
 
-So it is a good idea to build the model that can work in both ways.
-Our model shows a good way to do that effectively.
-For GPU training, we should always use NCHW.
-But if the model needs inference on CPU, we could use NHWC; weights obtained
-from training with NCHW data format can be used for inference in NHWC data
-format.
+Building a model to support both date formats keeps the model flexible and
+capable of operating optimally regardless of platform. Most TensorFlow
+operations used by a CNN support both NHWC and NCHW data format. The benchmark
+script was written to support both NCHW and NHWC. NCHW should always be used
+when training with GPUs. NHWC is sometimes faster on CPU. A flexible model can
+be trained on GPUs using NCHW with inference done on CPU using NHWC with the
+weights obtained from training.
 
 ### Use Fused Batch-Normalization
 
 The default batch-normalization in TensorFlow is implemented as composite
-operations.
-This is very general, but often leads to suboptimal performance.
-An alternative is the fused batch-normalization, and the performance on GPU
-is often much faster.
+operations. This is very general, but often leads to suboptimal performance. An
+alternative is to use fused batch-normalization which often has much better
+performance on GPU. Below is an example of using @{tf.contrib.layers.batch_norm}
+to implement fused batch-normalization.
+
+```python
+bn = tf.contrib.layers.batch_norm(
+          input_layer, fused=True, data_format='NCHW'
+          scope=scope)
+```
 
 ## Variable Distribution and Gradient Aggregation
 
 During training, training variable values are updated using aggregated gradients
-and deltas.  In this model, we demonstrate that with the flexible and
-general-purpose TensorFlow primitives, it is fairly easy to build a diverse
-range of high-performance distribution and aggregation schemes for different
-types of systems.
-
-For example:
-
-* The standard parameter-server where each replica of the training model reads
-  the variables directly, and updates the variable independently.  When each
-  model needs the variables, they are copied over through the standard implicit
-  copies added by the TensorFlow runtime. It is shown how to use this method
-  in either local training, distributed synchronous training, and distributed
-  asynchronous training.
-* A replicated mode for local training where each GPU has an identical
-  copy of the training parameters.  The forward and backward computation can
-  start immediately as the variable data is immediately available.  Gradients
-  are accumulated across all GPUs, and the aggregated total is applied to
-  each GPU's copy of the variables so that they stay in sync.
-* A distributed replicated mode of training where each GPU has an identical copy
-  of the training parameters, and a master copy of the variables is stored
-  on the parameter-servers.  The forward and backward computation can
-  start immediately as the variable data is immediately available.  Gradients
-  are accumulated across all GPUs on each server and then the per-server
-  aggregated gradients are applied to the master copy. After all workers do
-  this, each worker updates its copy of the variable from the master copy.
-
-We show that most of the variable distribution and aggregation subsystem can
-be implemented through TensorFlow low-level primitives with manageable
-complexity at the model level. Here we discuss some more details.
-
-### Parameter-server Variables
-
-The most common way trainable variables are managed in TensorFlow models is the
+and deltas. In the benchmark script, we demonstrate that with the flexible and
+general-purpose TensorFlow primitives, a diverse range of high-performance
+distribution and aggregation schemes can be built.
+
+Three examples of variable distribution and aggregation were included in the
+script:
+
+*   `parameter_server` where each replica of the training model reads the
+    variables from a parameter server and updates the variable independently.
+    When each model needs the variables, they are copied over through the
+    standard implicit copies added by the TensorFlow runtime. The example
+    [script](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks)
+    illustrates using this method for local training, distributed synchronous
+    training, and distributed asynchronous training.
+*   `replicated` places an identical copy of each training variable on each
+    GPU. The forward and backward computation can start immediately as the
+    variable data is immediately available. Gradients are accumulated across all
+    GPUs, and the aggregated total is applied to each GPU's copy of the
+    variables to keep them in sync.
+*   `distributed_replicated` places an identical copy of the training parameters
+    on each GPU along with a master copy on the parameter servers. The forward
+    and backward computation can start immediately as the variable data is
+    immediately available. Gradients are accumulated across all GPUs on each
+    server and then the per-server aggregated gradients are applied to the
+    master copy. After all workers do this, each worker updates its copy of the
+    variable from the master copy.
+
+Below are additional details about each approach.
+
+### Parameter Server Variables
+
+The most common way trainable variables are managed in TensorFlow models is
 parameter server mode.
 
-In a distributed system, this means that each worker process runs the same
-model, and parameter server processes own the master copies of the variables.
-When a worker needs a variable from a parameter server, it refers to it
-directly.  The TensorFlow runtime adds implicit copies to the graph to make the
-variable value available on the computation device that needs it. When a
-gradient is computed on a worker, it is sent to the parameter server that owns
-the particular variable, and the corresponding optimizer is used to update the
-variable.
+In a distributed system, each worker process runs the same model, and parameter
+server processes own the master copies of the variables. When a worker needs a
+variable from a parameter server, it refers to it directly. The TensorFlow
+runtime adds implicit copies to the graph to make the variable value available
+on the computation device that needs it. When a gradient is computed on a
+worker, it is sent to the parameter server that owns the particular variable,
+and the corresponding optimizer is used to update the variable.
 
 There are some techniques to improve throughput:
 
-* The variables are spread among parameter servers based on their size, for load
-  balancing.
-* When each worker has multiple GPUs, gradients are accumulated across the GPUs
-  and a single aggregated gradient is sent to the parameter server. This reduces
-  the network bandwidth and the amount of work done by the parameter servers.
+*   The variables are spread among parameter servers based on their size, for
+    load balancing.
+*   When each worker has multiple GPUs, gradients are accumulated across the
+    GPUs and a single aggregated gradient is sent to the parameter server. This
+    reduces the network bandwidth and the amount of work done by the parameter
+    servers.
 
 For coordinating between workers, a very common mode is async updates, where
 each worker updates the master copy of the variables without synchronizing with
-other workers.  In our model, we demonstrate that it is fairly easy to introduce
+other workers. In our model, we demonstrate that it is fairly easy to introduce
 synchronization across workers so updates for all workers are finished in one
 step before the next step can start.
 
-The parameter-server method can also be used for local training, In this case,
+The parameter server method can also be used for local training, In this case,
 instead of spreading the master copies of variables across parameters servers,
 they are either on the CPU or spread across the available GPUs.
 
 Due to the simple nature of this setup, this architecture has gained a lot of
 popularity within the community.
 
-This is available in the benchmark scripts as the 'parameter_server'
-variable_update mode.
+This mode can be used in the script by passing
+`--variable_update=parameter_server`.
 
-![parameter_server mode in distributed
-training](../images/perf_parameter_server_mode_doc.png){
-width="900" style="max-width: inherit"}
+<div style="width:100%; margin:auto; margin-bottom:10px; margin-top:20px;">
+  <img style="width:100%" alt="parameter_server mode in distributed training"
+   src="../images/perf_parameter_server_mode_doc.png">
+</div>
 
 ### Replicated Variables
 
@@ -292,19 +249,18 @@ devices and the fully aggregated gradient is then applied to each local copy.
 
 Gradient aggregation across the server can be done in different ways:
 
-* Using standard TensorFlow operations to accumulate the total on a single
-  device (CPU or GPU) and then copy it back to all GPUs.
-* Using NVIDIA NCCL, described below in the NCCL section.
+*   Using standard TensorFlow operations to accumulate the total on a single
+    device (CPU or GPU) and then copy it back to all GPUs.
+*   Using NVIDIA® NCCL, described below in the NCCL section.
 
-This is available in the benchmark scripts for local execution only, as the
-'replicated' variable_update mode.
+This mode can be used in the script by passing `--variable_update=replicated`.
 
 ### Replicated Variables in Distributed Training
 
-The replicated method for variables can be extended to distributed training.
-One way to do this like the replicated mode: aggregate the gradients fully
-across the cluster and apply them to each local copy of the variable. This may
-be shown in a future version of this scripts; the scripts do present a different
+The replicated method for variables can be extended to distributed training. One
+way to do this like the replicated mode: aggregate the gradients fully across
+the cluster and apply them to each local copy of the variable. This may be shown
+in a future version of this scripts; the scripts do present a different
 variation, described here.
 
 In this mode, in addition to each GPU's copy of the variables, a master copy is
@@ -314,28 +270,30 @@ immediately using the local copies of the variables.
 As the gradients of the weights become available, they are sent back to the
 parameter servers and all local copies are updated:
 
-1. All the gradients from the GPU on the same worker are aggregated together.
-2. Aggregated gradients from each worker are sent to the parameter server that
-   owns the variable, where the specified optimizer is used to update the
-   master copy of the variable.
-3. Each worker updates its local copy of the variable from the master. In
-   the example model, this is done with a cross-replica barrier that waits for
-   all the workers to finish updating the variables, and fetches the new
-   variable only after the barrier has been released by all replicas.  Once the
-   copy finishes for all variables, this marks the end of a training step, and a
-   new step can start.
+1.  All the gradients from the GPU on the same worker are aggregated together.
+2.  Aggregated gradients from each worker are sent to the parameter server that
+    owns the variable, where the specified optimizer is used to update the
+    master copy of the variable.
+3.  Each worker updates its local copy of the variable from the master. In the
+    example model, this is done with a cross-replica barrier that waits for all
+    the workers to finish updating the variables, and fetches the new variable
+    only after the barrier has been released by all replicas. Once the copy
+    finishes for all variables, this marks the end of a training step, and a new
+    step can start.
 
 Although this sounds similar to the standard use of parameter servers, the
-performance is often better in many cases.  This is largely due to the fact the
+performance is often better in many cases. This is largely due to the fact the
 computation can happen without any delay, and much of the copy latency of early
 gradients can be hidden by later computation layers.
 
-This is available in the benchmark scripts as the 'distributed_replicated'
-variable_update mode.
+This mode can be used in the script by passing
+`--variable_update=distributed_replicated`.
+
 
-![distributed_replicated mode](
-../images/perf_distributed_replicated_mode_doc.png){
-width="900" style="max-width: inherit"}
+<div style="width:100%; margin:auto; margin-bottom:10px; margin-top:20px;">
+  <img style="width:100%" alt="distributed_replicated mode"
+   src="../images/perf_distributed_replicated_mode_doc.png">
+</div>
 
 #### NCCL
 
@@ -343,47 +301,29 @@ In order to broadcast variables and aggregate gradients across different GPUs
 within the same host machine, we can use the default TensorFlow implicit copy
 mechanism.
 
-However, we can instead use the optional NCCL support.  NCCL is an NVIDIA
-library that can efficiently broadcast and aggregate data across different GPUs.
-It schedules a cooperating kernel on each GPU that knows how to best utilize the
-underlying hardware topology; this kernel uses a single SM of the GPU.
+However, we can instead use the optional NCCL (@{tf.contrib.nccl}) support. NCCL
+is an NVIDIA® library that can efficiently broadcast and aggregate data across
+different GPUs. It schedules a cooperating kernel on each GPU that knows how to
+best utilize the underlying hardware topology; this kernel uses a single SM of
+the GPU.
 
 In our experiment, we demonstrate that although NCCL often leads to much faster
-data aggregation by itself, it doesn't necessarily lead to faster training.  Our
+data aggregation by itself, it doesn't necessarily lead to faster training. Our
 hypothesis is that the implicit copies are essentially free since they go to the
 copy engine on GPU, as long as its latency can be hidden by the main computation
-itself.  Although NCCL can transfer data faster, it takes one SM away, and adds
-more pressure to the underlying L2 cache.  Our results show that for 8-GPUs,
-NCCL often leads to better performance.  However, for fewer GPUs, the implicit
-copies often perform better.
+itself. Although NCCL can transfer data faster, it takes one SM away, and adds
+more pressure to the underlying L2 cache. Our results show that for 8-GPUs, NCCL
+often leads to better performance. However, for fewer GPUs, the implicit copies
+often perform better.
 
 #### Staged Variables
 
 We further introduce a staged-variable mode where we use staging areas for both
-the variable reads, and their updates.
-Similar to software pipelining of the input pipeline, this can hide the data
-copy latency.
-If the computation time takes longer than the copy and aggregation, the copy
-itself becomes essentially free.
+the variable reads, and their updates. Similar to software pipelining of the
+input pipeline, this can hide the data copy latency. If the computation time
+takes longer than the copy and aggregation, the copy itself becomes essentially
+free.
 
 The downside is that all the weights read are from the previous training step.
-So it is a different algorithm from SGD.
-But it is possible to improve its convergence by adjusting learning rate and
-other hyperparameters.
-
-## Conclusions
-
-In this high-performance model, we present a number of options to build
-high-performance models in TensorFlow.
-Due to the flexible design in TensorFlow, advanced features like this often
-requires no system-level changes, and can be largely achieved through
-model-level changes.
-
-We do not claim which combination works best for a particular model.
-That should be left to the engineers who build the model and the training system.
-Many of the ingredients of the high-performance model will find their ways
-to high-level primitives that become transparent to users.
-However, we have shown that advanced users can easily tune and modify the
-underlying model behavior using low-level primitives.
-This could be very useful when improving performance for particular system
-setups and model configurations.
+So it is a different algorithm from SGD. But it is possible to improve its
+convergence by adjusting learning rate and other hyperparameters.
diff --git a/tensorflow/python/estimator/estimator.py b/tensorflow/python/estimator/estimator.py
index c04e37eccd..c394315cfa 100644
--- a/tensorflow/python/estimator/estimator.py
+++ b/tensorflow/python/estimator/estimator.py
@@ -94,13 +94,15 @@ class Estimator(object):
 
         * Args:
 
-          * `features`: single `Tensor` or `dict` of `Tensor`s
-                 (depending on data passed to `train`),
-          * `labels`: `Tensor` or `dict` of `Tensor`s (for multi-head
-                 models). If mode is `ModeKeys.PREDICT`, `labels=None` will be
-                 passed. If the `model_fn`'s signature does not accept
-                 `mode`, the `model_fn` must still be able to handle
-                 `labels=None`.
+          * `features`: This is the first item returned from the `input_fn`
+                 passed to `train`, 'evaluate`, and `predict`. This should be a
+                 single `Tensor` or `dict` of same.
+          * `labels`: This is the second item returned from the `input_fn`
+                 passed to `train`, 'evaluate`, and `predict`. This should be a
+                 single `Tensor` or `dict` of same (for multi-head models). If
+                 mode is `ModeKeys.PREDICT`, `labels=None` will be passed. If
+                 the `model_fn`'s signature does not accept `mode`, the
+                 `model_fn` must still be able to handle `labels=None`.
           * `mode`: Optional. Specifies if this training, evaluation or
                  prediction. See `ModeKeys`.
           * `params`: Optional `dict` of hyperparameters.  Will receive what
diff --git a/tensorflow/python/framework/tensor_shape.py b/tensorflow/python/framework/tensor_shape.py
index 3664710caa..73c810711f 100644
--- a/tensorflow/python/framework/tensor_shape.py
+++ b/tensorflow/python/framework/tensor_shape.py
@@ -12,7 +12,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ==============================================================================
-
 """Helper classes for tensor shape inference."""
 from __future__ import absolute_import
 from __future__ import division
@@ -31,8 +30,8 @@ class Dimension(object):
       self._value = None
     else:
       self._value = int(value)
-      if (not isinstance(value, compat.bytes_or_text_types)
-          and self._value != value):
+      if (not isinstance(value, compat.bytes_or_text_types) and
+          self._value != value):
         raise ValueError("Ambiguous dimension: %s" % value)
       if self._value < 0:
         raise ValueError("Dimension %d must be >= 0" % self._value)
@@ -89,9 +88,8 @@ class Dimension(object):
       True if this Dimension and `other` are compatible.
     """
     other = as_dimension(other)
-    return (self._value is None
-            or other.value is None
-            or self._value == other.value)
+    return (self._value is None or other.value is None or
+            self._value == other.value)
 
   def assert_is_compatible_with(self, other):
     """Raises an exception if `other` is not compatible with this Dimension.
@@ -104,8 +102,8 @@ class Dimension(object):
         is_compatible_with).
     """
     if not self.is_compatible_with(other):
-      raise ValueError("Dimensions %s and %s are not compatible"
-                       % (self, other))
+      raise ValueError("Dimensions %s and %s are not compatible" % (self,
+                                                                    other))
 
   def merge_with(self, other):
     """Returns a Dimension that combines the information in `self` and `other`.
@@ -385,18 +383,17 @@ class TensorShape(object):
   `Tensor`. It may be one of the following:
 
   * *Fully-known shape:* has a known number of dimensions and a known size
-    for each dimension.
+    for each dimension. e.g. `TensorShape([16, 256])`
   * *Partially-known shape:* has a known number of dimensions, and an unknown
-    size for one or more dimension.
+    size for one or more dimension. e.g. `TensorShape([None, 256])`
   * *Unknown shape:* has an unknown number of dimensions, and an unknown
-    size in all dimensions.
+    size in all dimensions. e.g. `TensorShape(None)`
 
   If a tensor is produced by an operation of type `"Foo"`, its shape
   may be inferred if there is a registered shape function for
-  `"Foo"`. See @{$adding_an_op#shape-functions-in-c$`Shape functions in   C++`} for
-  details of shape functions and how to register them. Alternatively,
-  the shape may be set explicitly using
-  @{tf.Tensor.set_shape}.
+  `"Foo"`. See @{$adding_an_op#shape-functions-in-c$`Shape functions in C++`}
+  for details of shape functions and how to register them. Alternatively,
+  the shape may be set explicitly using @{tf.Tensor.set_shape}.
   """
 
   def __init__(self, dims):
@@ -414,7 +411,7 @@ class TensorShape(object):
       self._dims = None
     elif isinstance(dims, compat.bytes_or_text_types):
       raise TypeError("A string has ambiguous TensorShape, please wrap in a "
-                       "list or convert to an int: %s" % dims)
+                      "list or convert to an int: %s" % dims)
     elif isinstance(dims, tensor_shape_pb2.TensorShapeProto):
       if dims.unknown_rank:
         self._dims = None
@@ -422,7 +419,8 @@ class TensorShape(object):
         self._dims = [
             # Protos store variable-size dimensions as -1
             as_dimension(dim.size if dim.size != -1 else None)
-            for dim in dims.dim]
+            for dim in dims.dim
+        ]
     elif isinstance(dims, TensorShape):
       self._dims = dims.dims
     else:
@@ -519,7 +517,7 @@ class TensorShape(object):
           # suffixes of otherwise unknown shapes.
           return unknown_shape()
         else:
-          return unknown_shape(ndims=stop-start)
+          return unknown_shape(ndims=stop - start)
       else:
         return Dimension(None)
 
@@ -560,8 +558,7 @@ class TensorShape(object):
           new_dims.append(dim.merge_with(other[i]))
         return TensorShape(new_dims)
       except ValueError:
-        raise ValueError("Shapes %s and %s are not compatible" %
-                         (self, other))
+        raise ValueError("Shapes %s and %s are not compatible" % (self, other))
 
   def concatenate(self, other):
     """Returns the concatenation of the dimension in `self` and `other`.
@@ -599,8 +596,8 @@ class TensorShape(object):
     other = as_shape(other)
     if self.ndims is not None and other.ndims is not None:
       if self.ndims != other.ndims:
-        raise ValueError(
-            "Shapes %s and %s must have the same rank" % (self, other))
+        raise ValueError("Shapes %s and %s must have the same rank" % (self,
+                                                                       other))
 
   def assert_has_rank(self, rank):
     """Raises an exception if `self` is not compatible with the given `rank`.
@@ -736,8 +733,8 @@ class TensorShape(object):
 
   def is_fully_defined(self):
     """Returns True iff `self` is fully defined in every dimension."""
-    return (self._dims is not None
-            and all(dim.value is not None for dim in self._dims))
+    return (self._dims is not None and all(dim.value is not None
+                                           for dim in self._dims))
 
   def assert_is_fully_defined(self):
     """Raises an exception if `self` is not fully defined in every dimension.
@@ -767,9 +764,10 @@ class TensorShape(object):
       return tensor_shape_pb2.TensorShapeProto(unknown_rank=True)
     else:
       return tensor_shape_pb2.TensorShapeProto(dim=[
-          tensor_shape_pb2.TensorShapeProto.Dim(
-              size=-1 if d.value is None else d.value)
-          for d in self._dims])
+          tensor_shape_pb2.TensorShapeProto.Dim(size=-1
+                                                if d.value is None else d.value)
+          for d in self._dims
+      ])
 
   def __eq__(self, other):
     """Returns True if `self` is equivalent to `other`."""
diff --git a/tensorflow/python/kernel_tests/distributions/BUILD b/tensorflow/python/kernel_tests/distributions/BUILD
index 3c1a4d5125..3630adc954 100644
--- a/tensorflow/python/kernel_tests/distributions/BUILD
+++ b/tensorflow/python/kernel_tests/distributions/BUILD
@@ -41,6 +41,180 @@ cuda_py_test(
     ],
 )
 
+cuda_py_test(
+    name = "beta_test",
+    size = "small",
+    srcs = ["beta_test.py"],
+    additional_deps = [
+        "//tensorflow/python/ops/distributions",
+        "//third_party/py/numpy",
+        "//tensorflow/python:client",
+        "//tensorflow/python:client_testlib",
+        "//tensorflow/python:framework",
+        "//tensorflow/python:framework_for_generated_wrappers",
+        "//tensorflow/python:math_ops",
+        "//tensorflow/python:nn_ops",
+        "//tensorflow/python:platform_test",
+    ],
+)
+
+cuda_py_test(
+    name = "bernoulli_test",
+    size = "small",
+    srcs = ["bernoulli_test.py"],
+    additional_deps = [
+        "//tensorflow/python/ops/distributions",
+        "//third_party/py/numpy",
+        "//tensorflow/python:array_ops",
+        "//tensorflow/python:client_testlib",
+        "//tensorflow/python:framework_for_generated_wrappers",
+        "//tensorflow/python:math_ops",
+        "//tensorflow/python:platform_test",
+    ],
+)
+
+cuda_py_test(
+    name = "categorical_test",
+    size = "small",
+    srcs = ["categorical_test.py"],
+    additional_deps = [
+        "//tensorflow/python/ops/distributions",
+        "//third_party/py/numpy",
+        "//tensorflow/python:array_ops",
+        "//tensorflow/python:client_testlib",
+        "//tensorflow/python:framework",
+        "//tensorflow/python:framework_for_generated_wrappers",
+        "//tensorflow/python:framework_test_lib",
+        "//tensorflow/python:math_ops",
+        "//tensorflow/python:platform_test",
+        "//tensorflow/python:random_ops",
+    ],
+)
+
+cuda_py_test(
+    name = "dirichlet_test",
+    size = "small",
+    srcs = ["dirichlet_test.py"],
+    additional_deps = [
+        "//tensorflow/python/ops/distributions",
+        "//third_party/py/numpy",
+        "//tensorflow/python:client_testlib",
+        "//tensorflow/python:framework_for_generated_wrappers",
+        "//tensorflow/python:framework_test_lib",
+        "//tensorflow/python:platform_test",
+    ],
+)
+
+cuda_py_test(
+    name = "dirichlet_multinomial_test",
+    size = "medium",
+    srcs = ["dirichlet_multinomial_test.py"],
+    additional_deps = [
+        "//tensorflow/python/ops/distributions",
+        "//third_party/py/numpy",
+        "//tensorflow/python:array_ops",
+        "//tensorflow/python:client_testlib",
+        "//tensorflow/python:framework_for_generated_wrappers",
+        "//tensorflow/python:framework_test_lib",
+        "//tensorflow/python:math_ops",
+        "//tensorflow/python:platform_test",
+    ],
+)
+
+cuda_py_test(
+    name = "exponential_test",
+    srcs = ["exponential_test.py"],
+    additional_deps = [
+        "//tensorflow/python/ops/distributions",
+        "//third_party/py/numpy",
+        "//tensorflow/python:client",
+        "//tensorflow/python:client_testlib",
+        "//tensorflow/python:framework_for_generated_wrappers",
+        "//tensorflow/python:framework_test_lib",
+        "//tensorflow/python:nn_ops",
+        "//tensorflow/python:platform_test",
+    ],
+)
+
+cuda_py_test(
+    name = "gamma_test",
+    srcs = ["gamma_test.py"],
+    additional_deps = [
+        "//tensorflow/python/ops/distributions",
+        "//third_party/py/numpy",
+        "//tensorflow/python:client",
+        "//tensorflow/python:client_testlib",
+        "//tensorflow/python:framework_for_generated_wrappers",
+        "//tensorflow/python:framework_test_lib",
+        "//tensorflow/python:nn_ops",
+        "//tensorflow/python:platform_test",
+    ],
+)
+
+cuda_py_test(
+    name = "laplace_test",
+    srcs = ["laplace_test.py"],
+    additional_deps = [
+        "//tensorflow/python/ops/distributions",
+        "//third_party/py/numpy",
+        "//tensorflow/python:client",
+        "//tensorflow/python:client_testlib",
+        "//tensorflow/python:framework_for_generated_wrappers",
+        "//tensorflow/python:framework_test_lib",
+        "//tensorflow/python:nn_ops",
+        "//tensorflow/python:platform_test",
+    ],
+)
+
+cuda_py_test(
+    name = "multinomial_test",
+    srcs = ["multinomial_test.py"],
+    additional_deps = [
+        "//tensorflow/python/ops/distributions",
+        "//third_party/py/numpy",
+        "//tensorflow/python:array_ops",
+        "//tensorflow/python:client_testlib",
+        "//tensorflow/python:framework_for_generated_wrappers",
+        "//tensorflow/python:framework_test_lib",
+        "//tensorflow/python:math_ops",
+        "//tensorflow/python:platform_test",
+    ],
+)
+
+cuda_py_test(
+    name = "student_t_test",
+    size = "small",
+    srcs = ["student_t_test.py"],
+    additional_deps = [
+        "//tensorflow/python/ops/distributions",
+        "//third_party/py/numpy",
+        "//tensorflow/python:client_testlib",
+        "//tensorflow/python:framework",
+        "//tensorflow/python:framework_for_generated_wrappers",
+        "//tensorflow/python:framework_test_lib",
+        "//tensorflow/python:math_ops",
+        "//tensorflow/python:nn_ops",
+        "//tensorflow/python:platform_test",
+    ],
+    tags = ["nomsan"],  # disable to avoid false positives from scipy.
+)
+
+cuda_py_test(
+    name = "uniform_test",
+    size = "small",
+    srcs = ["uniform_test.py"],
+    additional_deps = [
+        "//tensorflow/python/ops/distributions",
+        "//third_party/py/numpy",
+        "//tensorflow/python:array_ops",
+        "//tensorflow/python:client_testlib",
+        "//tensorflow/python:errors",
+        "//tensorflow/python:framework_for_generated_wrappers",
+        "//tensorflow/python:framework_test_lib",
+        "//tensorflow/python:math_ops",
+    ],
+)
+
 cuda_py_test(
     name = "normal_test",
     size = "medium",
diff --git a/tensorflow/python/kernel_tests/distributions/bernoulli_test.py b/tensorflow/python/kernel_tests/distributions/bernoulli_test.py
new file mode 100644
index 0000000000..ef93c4dab0
--- /dev/null
+++ b/tensorflow/python/kernel_tests/distributions/bernoulli_test.py
@@ -0,0 +1,320 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for the Bernoulli distribution."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import importlib
+
+import numpy as np
+
+from tensorflow.python.framework import constant_op
+from tensorflow.python.framework import dtypes
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops.distributions import bernoulli
+from tensorflow.python.ops.distributions import kullback_leibler
+from tensorflow.python.platform import test
+from tensorflow.python.platform import tf_logging
+
+
+def try_import(name):  # pylint: disable=invalid-name
+  module = None
+  try:
+    module = importlib.import_module(name)
+  except ImportError as e:
+    tf_logging.warning("Could not import %s: %s" % (name, str(e)))
+  return module
+
+
+special = try_import("scipy.special")
+
+
+def make_bernoulli(batch_shape, dtype=dtypes.int32):
+  p = np.random.uniform(size=list(batch_shape))
+  p = constant_op.constant(p, dtype=dtypes.float32)
+  return bernoulli.Bernoulli(probs=p, dtype=dtype)
+
+
+def entropy(p):
+  q = 1. - p
+  return -q * np.log(q) - p * np.log(p)
+
+
+class BernoulliTest(test.TestCase):
+
+  def testP(self):
+    p = [0.2, 0.4]
+    dist = bernoulli.Bernoulli(probs=p)
+    with self.test_session():
+      self.assertAllClose(p, dist.probs.eval())
+
+  def testLogits(self):
+    logits = [-42., 42.]
+    dist = bernoulli.Bernoulli(logits=logits)
+    with self.test_session():
+      self.assertAllClose(logits, dist.logits.eval())
+
+    if not special:
+      return
+
+    with self.test_session():
+      self.assertAllClose(special.expit(logits), dist.probs.eval())
+
+    p = [0.01, 0.99, 0.42]
+    dist = bernoulli.Bernoulli(probs=p)
+    with self.test_session():
+      self.assertAllClose(special.logit(p), dist.logits.eval())
+
+  def testInvalidP(self):
+    invalid_ps = [1.01, 2.]
+    for p in invalid_ps:
+      with self.test_session():
+        with self.assertRaisesOpError("probs has components greater than 1"):
+          dist = bernoulli.Bernoulli(probs=p, validate_args=True)
+          dist.probs.eval()
+
+    invalid_ps = [-0.01, -3.]
+    for p in invalid_ps:
+      with self.test_session():
+        with self.assertRaisesOpError("Condition x >= 0"):
+          dist = bernoulli.Bernoulli(probs=p, validate_args=True)
+          dist.probs.eval()
+
+    valid_ps = [0.0, 0.5, 1.0]
+    for p in valid_ps:
+      with self.test_session():
+        dist = bernoulli.Bernoulli(probs=p)
+        self.assertEqual(p, dist.probs.eval())  # Should not fail
+
+  def testShapes(self):
+    with self.test_session():
+      for batch_shape in ([], [1], [2, 3, 4]):
+        dist = make_bernoulli(batch_shape)
+        self.assertAllEqual(batch_shape, dist.batch_shape.as_list())
+        self.assertAllEqual(batch_shape, dist.batch_shape_tensor().eval())
+        self.assertAllEqual([], dist.event_shape.as_list())
+        self.assertAllEqual([], dist.event_shape_tensor().eval())
+
+  def testDtype(self):
+    dist = make_bernoulli([])
+    self.assertEqual(dist.dtype, dtypes.int32)
+    self.assertEqual(dist.dtype, dist.sample(5).dtype)
+    self.assertEqual(dist.dtype, dist.mode().dtype)
+    self.assertEqual(dist.probs.dtype, dist.mean().dtype)
+    self.assertEqual(dist.probs.dtype, dist.variance().dtype)
+    self.assertEqual(dist.probs.dtype, dist.stddev().dtype)
+    self.assertEqual(dist.probs.dtype, dist.entropy().dtype)
+    self.assertEqual(dist.probs.dtype, dist.prob(0).dtype)
+    self.assertEqual(dist.probs.dtype, dist.log_prob(0).dtype)
+
+    dist64 = make_bernoulli([], dtypes.int64)
+    self.assertEqual(dist64.dtype, dtypes.int64)
+    self.assertEqual(dist64.dtype, dist64.sample(5).dtype)
+    self.assertEqual(dist64.dtype, dist64.mode().dtype)
+
+  def _testPmf(self, **kwargs):
+    dist = bernoulli.Bernoulli(**kwargs)
+    with self.test_session():
+      # pylint: disable=bad-continuation
+      xs = [
+          0,
+          [1],
+          [1, 0],
+          [[1, 0]],
+          [[1, 0], [1, 1]],
+      ]
+      expected_pmfs = [
+          [[0.8, 0.6], [0.7, 0.4]],
+          [[0.2, 0.4], [0.3, 0.6]],
+          [[0.2, 0.6], [0.3, 0.4]],
+          [[0.2, 0.6], [0.3, 0.4]],
+          [[0.2, 0.6], [0.3, 0.6]],
+      ]
+      # pylint: enable=bad-continuation
+
+      for x, expected_pmf in zip(xs, expected_pmfs):
+        self.assertAllClose(dist.prob(x).eval(), expected_pmf)
+        self.assertAllClose(dist.log_prob(x).eval(), np.log(expected_pmf))
+
+  def testPmfCorrectBroadcastDynamicShape(self):
+    with self.test_session():
+      p = array_ops.placeholder(dtype=dtypes.float32)
+      dist = bernoulli.Bernoulli(probs=p)
+      event1 = [1, 0, 1]
+      event2 = [[1, 0, 1]]
+      self.assertAllClose(
+          dist.prob(event1).eval({
+              p: [0.2, 0.3, 0.4]
+          }), [0.2, 0.7, 0.4])
+      self.assertAllClose(
+          dist.prob(event2).eval({
+              p: [0.2, 0.3, 0.4]
+          }), [[0.2, 0.7, 0.4]])
+
+  def testPmfInvalid(self):
+    p = [0.1, 0.2, 0.7]
+    with self.test_session():
+      dist = bernoulli.Bernoulli(probs=p, validate_args=True)
+      with self.assertRaisesOpError("must be non-negative."):
+        dist.prob([1, 1, -1]).eval()
+      with self.assertRaisesOpError("is not less than or equal to 1."):
+        dist.prob([2, 0, 1]).eval()
+
+  def testPmfWithP(self):
+    p = [[0.2, 0.4], [0.3, 0.6]]
+    self._testPmf(probs=p)
+    if not special:
+      return
+    self._testPmf(logits=special.logit(p))
+
+  def testBroadcasting(self):
+    with self.test_session():
+      p = array_ops.placeholder(dtypes.float32)
+      dist = bernoulli.Bernoulli(probs=p)
+      self.assertAllClose(np.log(0.5), dist.log_prob(1).eval({p: 0.5}))
+      self.assertAllClose(
+          np.log([0.5, 0.5, 0.5]), dist.log_prob([1, 1, 1]).eval({
+              p: 0.5
+          }))
+      self.assertAllClose(
+          np.log([0.5, 0.5, 0.5]), dist.log_prob(1).eval({
+              p: [0.5, 0.5, 0.5]
+          }))
+
+  def testPmfShapes(self):
+    with self.test_session():
+      p = array_ops.placeholder(dtypes.float32, shape=[None, 1])
+      dist = bernoulli.Bernoulli(probs=p)
+      self.assertEqual(2, len(dist.log_prob(1).eval({p: [[0.5], [0.5]]}).shape))
+
+    with self.test_session():
+      dist = bernoulli.Bernoulli(probs=0.5)
+      self.assertEqual(2, len(dist.log_prob([[1], [1]]).eval().shape))
+
+    with self.test_session():
+      dist = bernoulli.Bernoulli(probs=0.5)
+      self.assertEqual((), dist.log_prob(1).get_shape())
+      self.assertEqual((1), dist.log_prob([1]).get_shape())
+      self.assertEqual((2, 1), dist.log_prob([[1], [1]]).get_shape())
+
+    with self.test_session():
+      dist = bernoulli.Bernoulli(probs=[[0.5], [0.5]])
+      self.assertEqual((2, 1), dist.log_prob(1).get_shape())
+
+  def testBoundaryConditions(self):
+    with self.test_session():
+      dist = bernoulli.Bernoulli(probs=1.0)
+      self.assertAllClose(np.nan, dist.log_prob(0).eval())
+      self.assertAllClose([np.nan], [dist.log_prob(1).eval()])
+
+  def testEntropyNoBatch(self):
+    p = 0.2
+    dist = bernoulli.Bernoulli(probs=p)
+    with self.test_session():
+      self.assertAllClose(dist.entropy().eval(), entropy(p))
+
+  def testEntropyWithBatch(self):
+    p = [[0.1, 0.7], [0.2, 0.6]]
+    dist = bernoulli.Bernoulli(probs=p, validate_args=False)
+    with self.test_session():
+      self.assertAllClose(dist.entropy().eval(), [[entropy(0.1), entropy(0.7)],
+                                                  [entropy(0.2), entropy(0.6)]])
+
+  def testSampleN(self):
+    with self.test_session():
+      p = [0.2, 0.6]
+      dist = bernoulli.Bernoulli(probs=p)
+      n = 100000
+      samples = dist.sample(n)
+      samples.set_shape([n, 2])
+      self.assertEqual(samples.dtype, dtypes.int32)
+      sample_values = samples.eval()
+      self.assertTrue(np.all(sample_values >= 0))
+      self.assertTrue(np.all(sample_values <= 1))
+      # Note that the standard error for the sample mean is ~ sqrt(p * (1 - p) /
+      # n). This means that the tolerance is very sensitive to the value of p
+      # as well as n.
+      self.assertAllClose(p, np.mean(sample_values, axis=0), atol=1e-2)
+      self.assertEqual(set([0, 1]), set(sample_values.flatten()))
+      # In this test we're just interested in verifying there isn't a crash
+      # owing to mismatched types. b/30940152
+      dist = bernoulli.Bernoulli(np.log([.2, .4]))
+      self.assertAllEqual((1, 2), dist.sample(1, seed=42).get_shape().as_list())
+
+  def testSampleActsLikeSampleN(self):
+    with self.test_session() as sess:
+      p = [0.2, 0.6]
+      dist = bernoulli.Bernoulli(probs=p)
+      n = 1000
+      seed = 42
+      self.assertAllEqual(
+          dist.sample(n, seed).eval(), dist.sample(n, seed).eval())
+      n = array_ops.placeholder(dtypes.int32)
+      sample, sample = sess.run([dist.sample(n, seed), dist.sample(n, seed)],
+                                feed_dict={n: 1000})
+      self.assertAllEqual(sample, sample)
+
+  def testMean(self):
+    with self.test_session():
+      p = np.array([[0.2, 0.7], [0.5, 0.4]], dtype=np.float32)
+      dist = bernoulli.Bernoulli(probs=p)
+      self.assertAllEqual(dist.mean().eval(), p)
+
+  def testVarianceAndStd(self):
+    var = lambda p: p * (1. - p)
+    with self.test_session():
+      p = [[0.2, 0.7], [0.5, 0.4]]
+      dist = bernoulli.Bernoulli(probs=p)
+      self.assertAllClose(
+          dist.variance().eval(),
+          np.array(
+              [[var(0.2), var(0.7)], [var(0.5), var(0.4)]], dtype=np.float32))
+      self.assertAllClose(
+          dist.stddev().eval(),
+          np.array(
+              [[np.sqrt(var(0.2)), np.sqrt(var(0.7))],
+               [np.sqrt(var(0.5)), np.sqrt(var(0.4))]],
+              dtype=np.float32))
+
+  def testBernoulliWithSigmoidProbs(self):
+    p = np.array([8.3, 4.2])
+    dist = bernoulli.BernoulliWithSigmoidProbs(logits=p)
+    with self.test_session():
+      self.assertAllClose(math_ops.sigmoid(p).eval(), dist.probs.eval())
+
+  def testBernoulliBernoulliKL(self):
+    with self.test_session() as sess:
+      batch_size = 6
+      a_p = np.array([0.5] * batch_size, dtype=np.float32)
+      b_p = np.array([0.4] * batch_size, dtype=np.float32)
+
+      a = bernoulli.Bernoulli(probs=a_p)
+      b = bernoulli.Bernoulli(probs=b_p)
+
+      kl = kullback_leibler.kl_divergence(a, b)
+      kl_val = sess.run(kl)
+
+      kl_expected = (a_p * np.log(a_p / b_p) + (1. - a_p) * np.log(
+          (1. - a_p) / (1. - b_p)))
+
+      self.assertEqual(kl.get_shape(), (batch_size,))
+      self.assertAllClose(kl_val, kl_expected)
+
+
+if __name__ == "__main__":
+  test.main()
diff --git a/tensorflow/python/kernel_tests/distributions/beta_test.py b/tensorflow/python/kernel_tests/distributions/beta_test.py
new file mode 100644
index 0000000000..91a451f033
--- /dev/null
+++ b/tensorflow/python/kernel_tests/distributions/beta_test.py
@@ -0,0 +1,394 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import importlib
+
+import numpy as np
+
+from tensorflow.python.client import session
+from tensorflow.python.framework import constant_op
+from tensorflow.python.framework import random_seed
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import nn_ops
+from tensorflow.python.ops.distributions import beta as beta_lib
+from tensorflow.python.ops.distributions import kullback_leibler
+from tensorflow.python.platform import test
+from tensorflow.python.platform import tf_logging
+
+
+def try_import(name):  # pylint: disable=invalid-name
+  module = None
+  try:
+    module = importlib.import_module(name)
+  except ImportError as e:
+    tf_logging.warning("Could not import %s: %s" % (name, str(e)))
+  return module
+
+
+special = try_import("scipy.special")
+stats = try_import("scipy.stats")
+
+
+class BetaTest(test.TestCase):
+
+  def testSimpleShapes(self):
+    with self.test_session():
+      a = np.random.rand(3)
+      b = np.random.rand(3)
+      dist = beta_lib.Beta(a, b)
+      self.assertAllEqual([], dist.event_shape_tensor().eval())
+      self.assertAllEqual([3], dist.batch_shape_tensor().eval())
+      self.assertEqual(tensor_shape.TensorShape([]), dist.event_shape)
+      self.assertEqual(tensor_shape.TensorShape([3]), dist.batch_shape)
+
+  def testComplexShapes(self):
+    with self.test_session():
+      a = np.random.rand(3, 2, 2)
+      b = np.random.rand(3, 2, 2)
+      dist = beta_lib.Beta(a, b)
+      self.assertAllEqual([], dist.event_shape_tensor().eval())
+      self.assertAllEqual([3, 2, 2], dist.batch_shape_tensor().eval())
+      self.assertEqual(tensor_shape.TensorShape([]), dist.event_shape)
+      self.assertEqual(
+          tensor_shape.TensorShape([3, 2, 2]), dist.batch_shape)
+
+  def testComplexShapesBroadcast(self):
+    with self.test_session():
+      a = np.random.rand(3, 2, 2)
+      b = np.random.rand(2, 2)
+      dist = beta_lib.Beta(a, b)
+      self.assertAllEqual([], dist.event_shape_tensor().eval())
+      self.assertAllEqual([3, 2, 2], dist.batch_shape_tensor().eval())
+      self.assertEqual(tensor_shape.TensorShape([]), dist.event_shape)
+      self.assertEqual(
+          tensor_shape.TensorShape([3, 2, 2]), dist.batch_shape)
+
+  def testAlphaProperty(self):
+    a = [[1., 2, 3]]
+    b = [[2., 4, 3]]
+    with self.test_session():
+      dist = beta_lib.Beta(a, b)
+      self.assertEqual([1, 3], dist.concentration1.get_shape())
+      self.assertAllClose(a, dist.concentration1.eval())
+
+  def testBetaProperty(self):
+    a = [[1., 2, 3]]
+    b = [[2., 4, 3]]
+    with self.test_session():
+      dist = beta_lib.Beta(a, b)
+      self.assertEqual([1, 3], dist.concentration0.get_shape())
+      self.assertAllClose(b, dist.concentration0.eval())
+
+  def testPdfXProper(self):
+    a = [[1., 2, 3]]
+    b = [[2., 4, 3]]
+    with self.test_session():
+      dist = beta_lib.Beta(a, b, validate_args=True)
+      dist.prob([.1, .3, .6]).eval()
+      dist.prob([.2, .3, .5]).eval()
+      # Either condition can trigger.
+      with self.assertRaisesOpError("sample must be positive"):
+        dist.prob([-1., 0.1, 0.5]).eval()
+      with self.assertRaisesOpError("sample must be positive"):
+        dist.prob([0., 0.1, 0.5]).eval()
+      with self.assertRaisesOpError("sample must be no larger than `1`"):
+        dist.prob([.1, .2, 1.2]).eval()
+
+  def testPdfTwoBatches(self):
+    with self.test_session():
+      a = [1., 2]
+      b = [1., 2]
+      x = [.5, .5]
+      dist = beta_lib.Beta(a, b)
+      pdf = dist.prob(x)
+      self.assertAllClose([1., 3. / 2], pdf.eval())
+      self.assertEqual((2,), pdf.get_shape())
+
+  def testPdfTwoBatchesNontrivialX(self):
+    with self.test_session():
+      a = [1., 2]
+      b = [1., 2]
+      x = [.3, .7]
+      dist = beta_lib.Beta(a, b)
+      pdf = dist.prob(x)
+      self.assertAllClose([1, 63. / 50], pdf.eval())
+      self.assertEqual((2,), pdf.get_shape())
+
+  def testPdfUniformZeroBatch(self):
+    with self.test_session():
+      # This is equivalent to a uniform distribution
+      a = 1.
+      b = 1.
+      x = np.array([.1, .2, .3, .5, .8], dtype=np.float32)
+      dist = beta_lib.Beta(a, b)
+      pdf = dist.prob(x)
+      self.assertAllClose([1.] * 5, pdf.eval())
+      self.assertEqual((5,), pdf.get_shape())
+
+  def testPdfAlphaStretchedInBroadcastWhenSameRank(self):
+    with self.test_session():
+      a = [[1., 2]]
+      b = [[1., 2]]
+      x = [[.5, .5], [.3, .7]]
+      dist = beta_lib.Beta(a, b)
+      pdf = dist.prob(x)
+      self.assertAllClose([[1., 3. / 2], [1., 63. / 50]], pdf.eval())
+      self.assertEqual((2, 2), pdf.get_shape())
+
+  def testPdfAlphaStretchedInBroadcastWhenLowerRank(self):
+    with self.test_session():
+      a = [1., 2]
+      b = [1., 2]
+      x = [[.5, .5], [.2, .8]]
+      pdf = beta_lib.Beta(a, b).prob(x)
+      self.assertAllClose([[1., 3. / 2], [1., 24. / 25]], pdf.eval())
+      self.assertEqual((2, 2), pdf.get_shape())
+
+  def testPdfXStretchedInBroadcastWhenSameRank(self):
+    with self.test_session():
+      a = [[1., 2], [2., 3]]
+      b = [[1., 2], [2., 3]]
+      x = [[.5, .5]]
+      pdf = beta_lib.Beta(a, b).prob(x)
+      self.assertAllClose([[1., 3. / 2], [3. / 2, 15. / 8]], pdf.eval())
+      self.assertEqual((2, 2), pdf.get_shape())
+
+  def testPdfXStretchedInBroadcastWhenLowerRank(self):
+    with self.test_session():
+      a = [[1., 2], [2., 3]]
+      b = [[1., 2], [2., 3]]
+      x = [.5, .5]
+      pdf = beta_lib.Beta(a, b).prob(x)
+      self.assertAllClose([[1., 3. / 2], [3. / 2, 15. / 8]], pdf.eval())
+      self.assertEqual((2, 2), pdf.get_shape())
+
+  def testBetaMean(self):
+    with session.Session():
+      a = [1., 2, 3]
+      b = [2., 4, 1.2]
+      dist = beta_lib.Beta(a, b)
+      self.assertEqual(dist.mean().get_shape(), (3,))
+      if not stats:
+        return
+      expected_mean = stats.beta.mean(a, b)
+      self.assertAllClose(expected_mean, dist.mean().eval())
+
+  def testBetaVariance(self):
+    with session.Session():
+      a = [1., 2, 3]
+      b = [2., 4, 1.2]
+      dist = beta_lib.Beta(a, b)
+      self.assertEqual(dist.variance().get_shape(), (3,))
+      if not stats:
+        return
+      expected_variance = stats.beta.var(a, b)
+      self.assertAllClose(expected_variance, dist.variance().eval())
+
+  def testBetaMode(self):
+    with session.Session():
+      a = np.array([1.1, 2, 3])
+      b = np.array([2., 4, 1.2])
+      expected_mode = (a - 1) / (a + b - 2)
+      dist = beta_lib.Beta(a, b)
+      self.assertEqual(dist.mode().get_shape(), (3,))
+      self.assertAllClose(expected_mode, dist.mode().eval())
+
+  def testBetaModeInvalid(self):
+    with session.Session():
+      a = np.array([1., 2, 3])
+      b = np.array([2., 4, 1.2])
+      dist = beta_lib.Beta(a, b, allow_nan_stats=False)
+      with self.assertRaisesOpError("Condition x < y.*"):
+        dist.mode().eval()
+
+      a = np.array([2., 2, 3])
+      b = np.array([1., 4, 1.2])
+      dist = beta_lib.Beta(a, b, allow_nan_stats=False)
+      with self.assertRaisesOpError("Condition x < y.*"):
+        dist.mode().eval()
+
+  def testBetaModeEnableAllowNanStats(self):
+    with session.Session():
+      a = np.array([1., 2, 3])
+      b = np.array([2., 4, 1.2])
+      dist = beta_lib.Beta(a, b, allow_nan_stats=True)
+
+      expected_mode = (a - 1) / (a + b - 2)
+      expected_mode[0] = np.nan
+      self.assertEqual((3,), dist.mode().get_shape())
+      self.assertAllClose(expected_mode, dist.mode().eval())
+
+      a = np.array([2., 2, 3])
+      b = np.array([1., 4, 1.2])
+      dist = beta_lib.Beta(a, b, allow_nan_stats=True)
+
+      expected_mode = (a - 1) / (a + b - 2)
+      expected_mode[0] = np.nan
+      self.assertEqual((3,), dist.mode().get_shape())
+      self.assertAllClose(expected_mode, dist.mode().eval())
+
+  def testBetaEntropy(self):
+    with session.Session():
+      a = [1., 2, 3]
+      b = [2., 4, 1.2]
+      dist = beta_lib.Beta(a, b)
+      self.assertEqual(dist.entropy().get_shape(), (3,))
+      if not stats:
+        return
+      expected_entropy = stats.beta.entropy(a, b)
+      self.assertAllClose(expected_entropy, dist.entropy().eval())
+
+  def testBetaSample(self):
+    with self.test_session():
+      a = 1.
+      b = 2.
+      beta = beta_lib.Beta(a, b)
+      n = constant_op.constant(100000)
+      samples = beta.sample(n)
+      sample_values = samples.eval()
+      self.assertEqual(sample_values.shape, (100000,))
+      self.assertFalse(np.any(sample_values < 0.0))
+      if not stats:
+        return
+      self.assertLess(
+          stats.kstest(
+              # Beta is a univariate distribution.
+              sample_values,
+              stats.beta(a=1., b=2.).cdf)[0],
+          0.01)
+      # The standard error of the sample mean is 1 / (sqrt(18 * n))
+      self.assertAllClose(
+          sample_values.mean(axis=0), stats.beta.mean(a, b), atol=1e-2)
+      self.assertAllClose(
+          np.cov(sample_values, rowvar=0), stats.beta.var(a, b), atol=1e-1)
+
+  # Test that sampling with the same seed twice gives the same results.
+  def testBetaSampleMultipleTimes(self):
+    with self.test_session():
+      a_val = 1.
+      b_val = 2.
+      n_val = 100
+
+      random_seed.set_random_seed(654321)
+      beta1 = beta_lib.Beta(concentration1=a_val,
+                            concentration0=b_val,
+                            name="beta1")
+      samples1 = beta1.sample(n_val, seed=123456).eval()
+
+      random_seed.set_random_seed(654321)
+      beta2 = beta_lib.Beta(concentration1=a_val,
+                            concentration0=b_val,
+                            name="beta2")
+      samples2 = beta2.sample(n_val, seed=123456).eval()
+
+      self.assertAllClose(samples1, samples2)
+
+  def testBetaSampleMultidimensional(self):
+    with self.test_session():
+      a = np.random.rand(3, 2, 2).astype(np.float32)
+      b = np.random.rand(3, 2, 2).astype(np.float32)
+      beta = beta_lib.Beta(a, b)
+      n = constant_op.constant(100000)
+      samples = beta.sample(n)
+      sample_values = samples.eval()
+      self.assertEqual(sample_values.shape, (100000, 3, 2, 2))
+      self.assertFalse(np.any(sample_values < 0.0))
+      if not stats:
+        return
+      self.assertAllClose(
+          sample_values[:, 1, :].mean(axis=0),
+          stats.beta.mean(a, b)[1, :],
+          atol=1e-1)
+
+  def testBetaCdf(self):
+    with self.test_session():
+      shape = (30, 40, 50)
+      for dt in (np.float32, np.float64):
+        a = 10. * np.random.random(shape).astype(dt)
+        b = 10. * np.random.random(shape).astype(dt)
+        x = np.random.random(shape).astype(dt)
+        actual = beta_lib.Beta(a, b).cdf(x).eval()
+        self.assertAllEqual(np.ones(shape, dtype=np.bool), 0. <= x)
+        self.assertAllEqual(np.ones(shape, dtype=np.bool), 1. >= x)
+        if not stats:
+          return
+        self.assertAllClose(stats.beta.cdf(x, a, b), actual, rtol=1e-4, atol=0)
+
+  def testBetaLogCdf(self):
+    with self.test_session():
+      shape = (30, 40, 50)
+      for dt in (np.float32, np.float64):
+        a = 10. * np.random.random(shape).astype(dt)
+        b = 10. * np.random.random(shape).astype(dt)
+        x = np.random.random(shape).astype(dt)
+        actual = math_ops.exp(beta_lib.Beta(a, b).log_cdf(x)).eval()
+        self.assertAllEqual(np.ones(shape, dtype=np.bool), 0. <= x)
+        self.assertAllEqual(np.ones(shape, dtype=np.bool), 1. >= x)
+        if not stats:
+          return
+        self.assertAllClose(stats.beta.cdf(x, a, b), actual, rtol=1e-4, atol=0)
+
+  def testBetaWithSoftplusConcentration(self):
+    with self.test_session():
+      a, b = -4.2, -9.1
+      dist = beta_lib.BetaWithSoftplusConcentration(a, b)
+      self.assertAllClose(nn_ops.softplus(a).eval(), dist.concentration1.eval())
+      self.assertAllClose(nn_ops.softplus(b).eval(), dist.concentration0.eval())
+
+  def testBetaBetaKL(self):
+    with self.test_session() as sess:
+      for shape in [(10,), (4, 5)]:
+        a1 = 6.0 * np.random.random(size=shape) + 1e-4
+        b1 = 6.0 * np.random.random(size=shape) + 1e-4
+        a2 = 6.0 * np.random.random(size=shape) + 1e-4
+        b2 = 6.0 * np.random.random(size=shape) + 1e-4
+        # Take inverse softplus of values to test BetaWithSoftplusConcentration
+        a1_sp = np.log(np.exp(a1) - 1.0)
+        b1_sp = np.log(np.exp(b1) - 1.0)
+        a2_sp = np.log(np.exp(a2) - 1.0)
+        b2_sp = np.log(np.exp(b2) - 1.0)
+
+        d1 = beta_lib.Beta(concentration1=a1, concentration0=b1)
+        d2 = beta_lib.Beta(concentration1=a2, concentration0=b2)
+        d1_sp = beta_lib.BetaWithSoftplusConcentration(concentration1=a1_sp,
+                                                       concentration0=b1_sp)
+        d2_sp = beta_lib.BetaWithSoftplusConcentration(concentration1=a2_sp,
+                                                       concentration0=b2_sp)
+
+        if not special:
+          return
+        kl_expected = (special.betaln(a2, b2) - special.betaln(a1, b1) +
+                       (a1 - a2) * special.digamma(a1) +
+                       (b1 - b2) * special.digamma(b1) +
+                       (a2 - a1 + b2 - b1) * special.digamma(a1 + b1))
+
+        for dist1 in [d1, d1_sp]:
+          for dist2 in [d2, d2_sp]:
+            kl = kullback_leibler.kl_divergence(dist1, dist2)
+            kl_val = sess.run(kl)
+            self.assertEqual(kl.get_shape(), shape)
+            self.assertAllClose(kl_val, kl_expected)
+
+        # Make sure KL(d1||d1) is 0
+        kl_same = sess.run(kullback_leibler.kl_divergence(d1, d1))
+        self.assertAllClose(kl_same, np.zeros_like(kl_expected))
+
+
+if __name__ == "__main__":
+  test.main()
diff --git a/tensorflow/python/kernel_tests/distributions/categorical_test.py b/tensorflow/python/kernel_tests/distributions/categorical_test.py
new file mode 100644
index 0000000000..bfdb5fa9fe
--- /dev/null
+++ b/tensorflow/python/kernel_tests/distributions/categorical_test.py
@@ -0,0 +1,297 @@
+# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for Categorical distribution."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+from tensorflow.python.framework import constant_op
+from tensorflow.python.framework import dtypes
+from tensorflow.python.framework import tensor_util
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import gradients_impl
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import nn_ops
+from tensorflow.python.ops import random_ops
+from tensorflow.python.ops.distributions import categorical
+from tensorflow.python.ops.distributions import kullback_leibler
+from tensorflow.python.platform import test
+
+
+def make_categorical(batch_shape, num_classes, dtype=dtypes.int32):
+  logits = random_ops.random_uniform(
+      list(batch_shape) + [num_classes], -10, 10, dtype=dtypes.float32) - 50.
+  return categorical.Categorical(logits, dtype=dtype)
+
+
+class CategoricalTest(test.TestCase):
+
+  def testP(self):
+    p = [0.2, 0.8]
+    dist = categorical.Categorical(probs=p)
+    with self.test_session():
+      self.assertAllClose(p, dist.probs.eval())
+      self.assertAllEqual([2], dist.logits.get_shape())
+
+  def testLogits(self):
+    p = np.array([0.2, 0.8], dtype=np.float32)
+    logits = np.log(p) - 50.
+    dist = categorical.Categorical(logits=logits)
+    with self.test_session():
+      self.assertAllEqual([2], dist.probs.get_shape())
+      self.assertAllEqual([2], dist.logits.get_shape())
+      self.assertAllClose(dist.probs.eval(), p)
+      self.assertAllClose(dist.logits.eval(), logits)
+
+  def testShapes(self):
+    with self.test_session():
+      for batch_shape in ([], [1], [2, 3, 4]):
+        dist = make_categorical(batch_shape, 10)
+        self.assertAllEqual(batch_shape, dist.batch_shape)
+        self.assertAllEqual(batch_shape, dist.batch_shape_tensor().eval())
+        self.assertAllEqual([], dist.event_shape)
+        self.assertAllEqual([], dist.event_shape_tensor().eval())
+        self.assertEqual(10, dist.event_size.eval())
+        # event_size is available as a constant because the shape is
+        # known at graph build time.
+        self.assertEqual(10, tensor_util.constant_value(dist.event_size))
+
+      for batch_shape in ([], [1], [2, 3, 4]):
+        dist = make_categorical(
+            batch_shape, constant_op.constant(
+                10, dtype=dtypes.int32))
+        self.assertAllEqual(len(batch_shape), dist.batch_shape.ndims)
+        self.assertAllEqual(batch_shape, dist.batch_shape_tensor().eval())
+        self.assertAllEqual([], dist.event_shape)
+        self.assertAllEqual([], dist.event_shape_tensor().eval())
+        self.assertEqual(10, dist.event_size.eval())
+
+  def testDtype(self):
+    dist = make_categorical([], 5, dtype=dtypes.int32)
+    self.assertEqual(dist.dtype, dtypes.int32)
+    self.assertEqual(dist.dtype, dist.sample(5).dtype)
+    self.assertEqual(dist.dtype, dist.mode().dtype)
+    dist = make_categorical([], 5, dtype=dtypes.int64)
+    self.assertEqual(dist.dtype, dtypes.int64)
+    self.assertEqual(dist.dtype, dist.sample(5).dtype)
+    self.assertEqual(dist.dtype, dist.mode().dtype)
+    self.assertEqual(dist.probs.dtype, dtypes.float32)
+    self.assertEqual(dist.logits.dtype, dtypes.float32)
+    self.assertEqual(dist.logits.dtype, dist.entropy().dtype)
+    self.assertEqual(
+        dist.logits.dtype, dist.prob(np.array(
+            0, dtype=np.int64)).dtype)
+    self.assertEqual(
+        dist.logits.dtype, dist.log_prob(np.array(
+            0, dtype=np.int64)).dtype)
+
+  def testUnknownShape(self):
+    with self.test_session():
+      logits = array_ops.placeholder(dtype=dtypes.float32)
+      dist = categorical.Categorical(logits)
+      sample = dist.sample()
+      # Will sample class 1.
+      sample_value = sample.eval(feed_dict={logits: [-1000.0, 1000.0]})
+      self.assertEqual(1, sample_value)
+
+      # Batch entry 0 will sample class 1, batch entry 1 will sample class 0.
+      sample_value_batch = sample.eval(
+          feed_dict={logits: [[-1000.0, 1000.0], [1000.0, -1000.0]]})
+      self.assertAllEqual([1, 0], sample_value_batch)
+
+  def testPMFWithBatch(self):
+    histograms = [[0.2, 0.8], [0.6, 0.4]]
+    dist = categorical.Categorical(math_ops.log(histograms) - 50.)
+    with self.test_session():
+      self.assertAllClose(dist.prob([0, 1]).eval(), [0.2, 0.4])
+
+  def testPMFNoBatch(self):
+    histograms = [0.2, 0.8]
+    dist = categorical.Categorical(math_ops.log(histograms) - 50.)
+    with self.test_session():
+      self.assertAllClose(dist.prob(0).eval(), 0.2)
+
+  def testLogPMF(self):
+    logits = np.log([[0.2, 0.8], [0.6, 0.4]]) - 50.
+    dist = categorical.Categorical(logits)
+    with self.test_session():
+      self.assertAllClose(dist.log_prob([0, 1]).eval(), np.log([0.2, 0.4]))
+
+  def testEntropyNoBatch(self):
+    logits = np.log([0.2, 0.8]) - 50.
+    dist = categorical.Categorical(logits)
+    with self.test_session():
+      self.assertAllClose(dist.entropy().eval(),
+                          -(0.2 * np.log(0.2) + 0.8 * np.log(0.8)))
+
+  def testEntropyWithBatch(self):
+    logits = np.log([[0.2, 0.8], [0.6, 0.4]]) - 50.
+    dist = categorical.Categorical(logits)
+    with self.test_session():
+      self.assertAllClose(dist.entropy().eval(), [
+          -(0.2 * np.log(0.2) + 0.8 * np.log(0.8)),
+          -(0.6 * np.log(0.6) + 0.4 * np.log(0.4))
+      ])
+
+  def testEntropyGradient(self):
+    with self.test_session() as sess:
+      logits = constant_op.constant([[1., 2., 3.], [2., 5., 1.]])
+
+      probabilities = nn_ops.softmax(logits)
+      log_probabilities = nn_ops.log_softmax(logits)
+      true_entropy = - math_ops.reduce_sum(
+          probabilities * log_probabilities, axis=-1)
+
+      categorical_distribution = categorical.Categorical(probs=probabilities)
+      categorical_entropy = categorical_distribution.entropy()
+
+      # works
+      true_entropy_g = gradients_impl.gradients(true_entropy, [logits])
+      categorical_entropy_g = gradients_impl.gradients(
+          categorical_entropy, [logits])
+
+      res = sess.run({"true_entropy": true_entropy,
+                      "categorical_entropy": categorical_entropy,
+                      "true_entropy_g": true_entropy_g,
+                      "categorical_entropy_g": categorical_entropy_g})
+      self.assertAllClose(res["true_entropy"],
+                          res["categorical_entropy"])
+      self.assertAllClose(res["true_entropy_g"],
+                          res["categorical_entropy_g"])
+
+  def testSample(self):
+    with self.test_session():
+      histograms = [[[0.2, 0.8], [0.4, 0.6]]]
+      dist = categorical.Categorical(math_ops.log(histograms) - 50.)
+      n = 10000
+      samples = dist.sample(n, seed=123)
+      samples.set_shape([n, 1, 2])
+      self.assertEqual(samples.dtype, dtypes.int32)
+      sample_values = samples.eval()
+      self.assertFalse(np.any(sample_values < 0))
+      self.assertFalse(np.any(sample_values > 1))
+      self.assertAllClose(
+          [[0.2, 0.4]], np.mean(
+              sample_values == 0, axis=0), atol=1e-2)
+      self.assertAllClose(
+          [[0.8, 0.6]], np.mean(
+              sample_values == 1, axis=0), atol=1e-2)
+
+  def testSampleWithSampleShape(self):
+    with self.test_session():
+      histograms = [[[0.2, 0.8], [0.4, 0.6]]]
+      dist = categorical.Categorical(math_ops.log(histograms) - 50.)
+      samples = dist.sample((100, 100), seed=123)
+      prob = dist.prob(samples)
+      prob_val = prob.eval()
+      self.assertAllClose(
+          [0.2**2 + 0.8**2], [prob_val[:, :, :, 0].mean()], atol=1e-2)
+      self.assertAllClose(
+          [0.4**2 + 0.6**2], [prob_val[:, :, :, 1].mean()], atol=1e-2)
+
+  def testLogPMFBroadcasting(self):
+    with self.test_session():
+      histograms = [[[0.2, 0.8], [0.4, 0.6]]]
+      dist = categorical.Categorical(math_ops.log(histograms) - 50.)
+
+      prob = dist.prob(1)
+      self.assertAllClose([[0.8, 0.6]], prob.eval())
+
+      prob = dist.prob([1])
+      self.assertAllClose([[0.8, 0.6]], prob.eval())
+
+      prob = dist.prob([0, 1])
+      self.assertAllClose([[0.2, 0.6]], prob.eval())
+
+      prob = dist.prob([[0, 1]])
+      self.assertAllClose([[0.2, 0.6]], prob.eval())
+
+      prob = dist.prob([[[0, 1]]])
+      self.assertAllClose([[[0.2, 0.6]]], prob.eval())
+
+      prob = dist.prob([[1, 0], [0, 1]])
+      self.assertAllClose([[0.8, 0.4], [0.2, 0.6]], prob.eval())
+
+      prob = dist.prob([[[1, 1], [1, 0]], [[1, 0], [0, 1]]])
+      self.assertAllClose([[[0.8, 0.6], [0.8, 0.4]], [[0.8, 0.4], [0.2, 0.6]]],
+                          prob.eval())
+
+  def testLogPMFShape(self):
+    with self.test_session():
+      # shape [1, 2, 2]
+      histograms = [[[0.2, 0.8], [0.4, 0.6]]]
+      dist = categorical.Categorical(math_ops.log(histograms))
+
+      log_prob = dist.log_prob([0, 1])
+      self.assertEqual(2, log_prob.get_shape().ndims)
+      self.assertAllEqual([1, 2], log_prob.get_shape())
+
+      log_prob = dist.log_prob([[[1, 1], [1, 0]], [[1, 0], [0, 1]]])
+      self.assertEqual(3, log_prob.get_shape().ndims)
+      self.assertAllEqual([2, 2, 2], log_prob.get_shape())
+
+  def testLogPMFShapeNoBatch(self):
+    histograms = [0.2, 0.8]
+    dist = categorical.Categorical(math_ops.log(histograms))
+
+    log_prob = dist.log_prob(0)
+    self.assertEqual(0, log_prob.get_shape().ndims)
+    self.assertAllEqual([], log_prob.get_shape())
+
+    log_prob = dist.log_prob([[[1, 1], [1, 0]], [[1, 0], [0, 1]]])
+    self.assertEqual(3, log_prob.get_shape().ndims)
+    self.assertAllEqual([2, 2, 2], log_prob.get_shape())
+
+  def testMode(self):
+    with self.test_session():
+      histograms = [[[0.2, 0.8], [0.6, 0.4]]]
+      dist = categorical.Categorical(math_ops.log(histograms) - 50.)
+      self.assertAllEqual(dist.mode().eval(), [[1, 0]])
+
+  def testCategoricalCategoricalKL(self):
+
+    def np_softmax(logits):
+      exp_logits = np.exp(logits)
+      return exp_logits / exp_logits.sum(axis=-1, keepdims=True)
+
+    with self.test_session() as sess:
+      for categories in [2, 4]:
+        for batch_size in [1, 10]:
+          a_logits = np.random.randn(batch_size, categories)
+          b_logits = np.random.randn(batch_size, categories)
+
+          a = categorical.Categorical(logits=a_logits)
+          b = categorical.Categorical(logits=b_logits)
+
+          kl = kullback_leibler.kl_divergence(a, b)
+          kl_val = sess.run(kl)
+          # Make sure KL(a||a) is 0
+          kl_same = sess.run(kullback_leibler.kl_divergence(a, a))
+
+          prob_a = np_softmax(a_logits)
+          prob_b = np_softmax(b_logits)
+          kl_expected = np.sum(prob_a * (np.log(prob_a) - np.log(prob_b)),
+                               axis=-1)
+
+          self.assertEqual(kl.get_shape(), (batch_size,))
+          self.assertAllClose(kl_val, kl_expected)
+          self.assertAllClose(kl_same, np.zeros_like(kl_expected))
+
+
+if __name__ == "__main__":
+  test.main()
diff --git a/tensorflow/python/kernel_tests/distributions/dirichlet_multinomial_test.py b/tensorflow/python/kernel_tests/distributions/dirichlet_multinomial_test.py
new file mode 100644
index 0000000000..d009f4e931
--- /dev/null
+++ b/tensorflow/python/kernel_tests/distributions/dirichlet_multinomial_test.py
@@ -0,0 +1,480 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+from tensorflow.python.framework import dtypes
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops.distributions import dirichlet_multinomial
+from tensorflow.python.platform import test
+
+
+ds = dirichlet_multinomial
+
+
+class DirichletMultinomialTest(test.TestCase):
+
+  def setUp(self):
+    self._rng = np.random.RandomState(42)
+
+  def testSimpleShapes(self):
+    with self.test_session():
+      alpha = np.random.rand(3)
+      dist = ds.DirichletMultinomial(1., alpha)
+      self.assertEqual(3, dist.event_shape_tensor().eval())
+      self.assertAllEqual([], dist.batch_shape_tensor().eval())
+      self.assertEqual(tensor_shape.TensorShape([3]), dist.event_shape)
+      self.assertEqual(tensor_shape.TensorShape([]), dist.batch_shape)
+
+  def testComplexShapes(self):
+    with self.test_session():
+      alpha = np.random.rand(3, 2, 2)
+      n = [[3., 2], [4, 5], [6, 7]]
+      dist = ds.DirichletMultinomial(n, alpha)
+      self.assertEqual(2, dist.event_shape_tensor().eval())
+      self.assertAllEqual([3, 2], dist.batch_shape_tensor().eval())
+      self.assertEqual(tensor_shape.TensorShape([2]), dist.event_shape)
+      self.assertEqual(tensor_shape.TensorShape([3, 2]), dist.batch_shape)
+
+  def testNproperty(self):
+    alpha = [[1., 2, 3]]
+    n = [[5.]]
+    with self.test_session():
+      dist = ds.DirichletMultinomial(n, alpha)
+      self.assertEqual([1, 1], dist.total_count.get_shape())
+      self.assertAllClose(n, dist.total_count.eval())
+
+  def testAlphaProperty(self):
+    alpha = [[1., 2, 3]]
+    with self.test_session():
+      dist = ds.DirichletMultinomial(1, alpha)
+      self.assertEqual([1, 3], dist.concentration.get_shape())
+      self.assertAllClose(alpha, dist.concentration.eval())
+
+  def testPmfNandCountsAgree(self):
+    alpha = [[1., 2, 3]]
+    n = [[5.]]
+    with self.test_session():
+      dist = ds.DirichletMultinomial(n, alpha, validate_args=True)
+      dist.prob([2., 3, 0]).eval()
+      dist.prob([3., 0, 2]).eval()
+      with self.assertRaisesOpError("counts must be non-negative"):
+        dist.prob([-1., 4, 2]).eval()
+      with self.assertRaisesOpError(
+          "counts last-dimension must sum to `self.total_count`"):
+        dist.prob([3., 3, 0]).eval()
+
+  def testPmfNonIntegerCounts(self):
+    alpha = [[1., 2, 3]]
+    n = [[5.]]
+    with self.test_session():
+      dist = ds.DirichletMultinomial(n, alpha, validate_args=True)
+      dist.prob([2., 3, 0]).eval()
+      dist.prob([3., 0, 2]).eval()
+      dist.prob([3.0, 0, 2.0]).eval()
+      # Both equality and integer checking fail.
+      placeholder = array_ops.placeholder(dtypes.float32)
+      with self.assertRaisesOpError(
+          "counts cannot contain fractional components"):
+        dist.prob(placeholder).eval(feed_dict={placeholder: [1.0, 2.5, 1.5]})
+      dist = ds.DirichletMultinomial(n, alpha, validate_args=False)
+      dist.prob([1., 2., 3.]).eval()
+      # Non-integer arguments work.
+      dist.prob([1.0, 2.5, 1.5]).eval()
+
+  def testPmfBothZeroBatches(self):
+    # The probabilities of one vote falling into class k is the mean for class
+    # k.
+    with self.test_session():
+      # Both zero-batches.  No broadcast
+      alpha = [1., 2]
+      counts = [1., 0]
+      dist = ds.DirichletMultinomial(1., alpha)
+      pmf = dist.prob(counts)
+      self.assertAllClose(1 / 3., pmf.eval())
+      self.assertEqual((), pmf.get_shape())
+
+  def testPmfBothZeroBatchesNontrivialN(self):
+    # The probabilities of one vote falling into class k is the mean for class
+    # k.
+    with self.test_session():
+      # Both zero-batches.  No broadcast
+      alpha = [1., 2]
+      counts = [3., 2]
+      dist = ds.DirichletMultinomial(5., alpha)
+      pmf = dist.prob(counts)
+      self.assertAllClose(1 / 7., pmf.eval())
+      self.assertEqual((), pmf.get_shape())
+
+  def testPmfBothZeroBatchesMultidimensionalN(self):
+    # The probabilities of one vote falling into class k is the mean for class
+    # k.
+    with self.test_session():
+      alpha = [1., 2]
+      counts = [3., 2]
+      n = np.full([4, 3], 5., dtype=np.float32)
+      dist = ds.DirichletMultinomial(n, alpha)
+      pmf = dist.prob(counts)
+      self.assertAllClose([[1 / 7., 1 / 7., 1 / 7.]] * 4, pmf.eval())
+      self.assertEqual((4, 3), pmf.get_shape())
+
+  def testPmfAlphaStretchedInBroadcastWhenSameRank(self):
+    # The probabilities of one vote falling into class k is the mean for class
+    # k.
+    with self.test_session():
+      alpha = [[1., 2]]
+      counts = [[1., 0], [0., 1]]
+      dist = ds.DirichletMultinomial([1.], alpha)
+      pmf = dist.prob(counts)
+      self.assertAllClose([1 / 3., 2 / 3.], pmf.eval())
+      self.assertAllEqual([2], pmf.get_shape())
+
+  def testPmfAlphaStretchedInBroadcastWhenLowerRank(self):
+    # The probabilities of one vote falling into class k is the mean for class
+    # k.
+    with self.test_session():
+      alpha = [1., 2]
+      counts = [[1., 0], [0., 1]]
+      pmf = ds.DirichletMultinomial(1., alpha).prob(counts)
+      self.assertAllClose([1 / 3., 2 / 3.], pmf.eval())
+      self.assertAllEqual([2], pmf.get_shape())
+
+  def testPmfCountsStretchedInBroadcastWhenSameRank(self):
+    # The probabilities of one vote falling into class k is the mean for class
+    # k.
+    with self.test_session():
+      alpha = [[1., 2], [2., 3]]
+      counts = [[1., 0]]
+      pmf = ds.DirichletMultinomial([1., 1.], alpha).prob(counts)
+      self.assertAllClose([1 / 3., 2 / 5.], pmf.eval())
+      self.assertAllEqual([2], pmf.get_shape())
+
+  def testPmfCountsStretchedInBroadcastWhenLowerRank(self):
+    # The probabilities of one vote falling into class k is the mean for class
+    # k.
+    with self.test_session():
+      alpha = [[1., 2], [2., 3]]
+      counts = [1., 0]
+      pmf = ds.DirichletMultinomial(1., alpha).prob(counts)
+      self.assertAllClose([1 / 3., 2 / 5.], pmf.eval())
+      self.assertAllEqual([2], pmf.get_shape())
+
+  def testPmfForOneVoteIsTheMeanWithOneRecordInput(self):
+    # The probabilities of one vote falling into class k is the mean for class
+    # k.
+    alpha = [1., 2, 3]
+    with self.test_session():
+      for class_num in range(3):
+        counts = np.zeros([3], dtype=np.float32)
+        counts[class_num] = 1
+        dist = ds.DirichletMultinomial(1., alpha)
+        mean = dist.mean().eval()
+        pmf = dist.prob(counts).eval()
+
+        self.assertAllClose(mean[class_num], pmf)
+        self.assertAllEqual([3], mean.shape)
+        self.assertAllEqual([], pmf.shape)
+
+  def testMeanDoubleTwoVotes(self):
+    # The probabilities of two votes falling into class k for
+    # DirichletMultinomial(2, alpha) is twice as much as the probability of one
+    # vote falling into class k for DirichletMultinomial(1, alpha)
+    alpha = [1., 2, 3]
+    with self.test_session():
+      for class_num in range(3):
+        counts_one = np.zeros([3], dtype=np.float32)
+        counts_one[class_num] = 1.
+        counts_two = np.zeros([3], dtype=np.float32)
+        counts_two[class_num] = 2
+
+        dist1 = ds.DirichletMultinomial(1., alpha)
+        dist2 = ds.DirichletMultinomial(2., alpha)
+
+        mean1 = dist1.mean().eval()
+        mean2 = dist2.mean().eval()
+
+        self.assertAllClose(mean2[class_num], 2 * mean1[class_num])
+        self.assertAllEqual([3], mean1.shape)
+
+  def testCovarianceFromSampling(self):
+    # We will test mean, cov, var, stddev on a DirichletMultinomial constructed
+    # via broadcast between alpha, n.
+    alpha = np.array([[1., 2, 3],
+                      [2.5, 4, 0.01]], dtype=np.float32)
+    # Ideally we'd be able to test broadcasting but, the multinomial sampler
+    # doesn't support different total counts.
+    n = np.float32(5)
+    with self.test_session() as sess:
+      # batch_shape=[2], event_shape=[3]
+      dist = ds.DirichletMultinomial(n, alpha)
+      x = dist.sample(int(250e3), seed=1)
+      sample_mean = math_ops.reduce_mean(x, 0)
+      x_centered = x - sample_mean[array_ops.newaxis, ...]
+      sample_cov = math_ops.reduce_mean(math_ops.matmul(
+          x_centered[..., array_ops.newaxis],
+          x_centered[..., array_ops.newaxis, :]), 0)
+      sample_var = array_ops.matrix_diag_part(sample_cov)
+      sample_stddev = math_ops.sqrt(sample_var)
+      [
+          sample_mean_,
+          sample_cov_,
+          sample_var_,
+          sample_stddev_,
+          analytic_mean,
+          analytic_cov,
+          analytic_var,
+          analytic_stddev,
+      ] = sess.run([
+          sample_mean,
+          sample_cov,
+          sample_var,
+          sample_stddev,
+          dist.mean(),
+          dist.covariance(),
+          dist.variance(),
+          dist.stddev(),
+      ])
+      self.assertAllClose(sample_mean_, analytic_mean, atol=0., rtol=0.04)
+      self.assertAllClose(sample_cov_, analytic_cov, atol=0., rtol=0.05)
+      self.assertAllClose(sample_var_, analytic_var, atol=0., rtol=0.03)
+      self.assertAllClose(sample_stddev_, analytic_stddev, atol=0., rtol=0.02)
+
+  def testCovariance(self):
+    # Shape [2]
+    alpha = [1., 2]
+    ns = [2., 3., 4., 5.]
+    alpha_0 = np.sum(alpha)
+
+    # Diagonal entries are of the form:
+    # Var(X_i) = n * alpha_i / alpha_sum * (1 - alpha_i / alpha_sum) *
+    # (alpha_sum + n) / (alpha_sum + 1)
+    variance_entry = lambda a, a_sum: a / a_sum * (1 - a / a_sum)
+    # Off diagonal entries are of the form:
+    # Cov(X_i, X_j) = -n * alpha_i * alpha_j / (alpha_sum ** 2) *
+    # (alpha_sum + n) / (alpha_sum + 1)
+    covariance_entry = lambda a, b, a_sum: -a * b / a_sum**2
+    # Shape [2, 2].
+    shared_matrix = np.array([[
+        variance_entry(alpha[0], alpha_0),
+        covariance_entry(alpha[0], alpha[1], alpha_0)
+    ], [
+        covariance_entry(alpha[1], alpha[0], alpha_0),
+        variance_entry(alpha[1], alpha_0)
+    ]])
+
+    with self.test_session():
+      for n in ns:
+        # n is shape [] and alpha is shape [2].
+        dist = ds.DirichletMultinomial(n, alpha)
+        covariance = dist.covariance()
+        expected_covariance = n * (n + alpha_0) / (1 + alpha_0) * shared_matrix
+
+        self.assertEqual([2, 2], covariance.get_shape())
+        self.assertAllClose(expected_covariance, covariance.eval())
+
+  def testCovarianceNAlphaBroadcast(self):
+    alpha_v = [1., 2, 3]
+    alpha_0 = 6.
+
+    # Shape [4, 3]
+    alpha = np.array(4 * [alpha_v], dtype=np.float32)
+    # Shape [4, 1]
+    ns = np.array([[2.], [3.], [4.], [5.]], dtype=np.float32)
+
+    variance_entry = lambda a, a_sum: a / a_sum * (1 - a / a_sum)
+    covariance_entry = lambda a, b, a_sum: -a * b / a_sum**2
+    # Shape [4, 3, 3]
+    shared_matrix = np.array(
+        4 * [[[
+            variance_entry(alpha_v[0], alpha_0),
+            covariance_entry(alpha_v[0], alpha_v[1], alpha_0),
+            covariance_entry(alpha_v[0], alpha_v[2], alpha_0)
+        ], [
+            covariance_entry(alpha_v[1], alpha_v[0], alpha_0),
+            variance_entry(alpha_v[1], alpha_0),
+            covariance_entry(alpha_v[1], alpha_v[2], alpha_0)
+        ], [
+            covariance_entry(alpha_v[2], alpha_v[0], alpha_0),
+            covariance_entry(alpha_v[2], alpha_v[1], alpha_0),
+            variance_entry(alpha_v[2], alpha_0)
+        ]]],
+        dtype=np.float32)
+
+    with self.test_session():
+      # ns is shape [4, 1], and alpha is shape [4, 3].
+      dist = ds.DirichletMultinomial(ns, alpha)
+      covariance = dist.covariance()
+      expected_covariance = shared_matrix * (
+          ns * (ns + alpha_0) / (1 + alpha_0))[..., array_ops.newaxis]
+
+      self.assertEqual([4, 3, 3], covariance.get_shape())
+      self.assertAllClose(expected_covariance, covariance.eval())
+
+  def testCovarianceMultidimensional(self):
+    alpha = np.random.rand(3, 5, 4).astype(np.float32)
+    alpha2 = np.random.rand(6, 3, 3).astype(np.float32)
+
+    ns = np.random.randint(low=1, high=11, size=[3, 5, 1]).astype(np.float32)
+    ns2 = np.random.randint(low=1, high=11, size=[6, 1, 1]).astype(np.float32)
+
+    with self.test_session():
+      dist = ds.DirichletMultinomial(ns, alpha)
+      dist2 = ds.DirichletMultinomial(ns2, alpha2)
+
+      covariance = dist.covariance()
+      covariance2 = dist2.covariance()
+      self.assertEqual([3, 5, 4, 4], covariance.get_shape())
+      self.assertEqual([6, 3, 3, 3], covariance2.get_shape())
+
+  def testZeroCountsResultsInPmfEqualToOne(self):
+    # There is only one way for zero items to be selected, and this happens with
+    # probability 1.
+    alpha = [5, 0.5]
+    counts = [0., 0]
+    with self.test_session():
+      dist = ds.DirichletMultinomial(0., alpha)
+      pmf = dist.prob(counts)
+      self.assertAllClose(1.0, pmf.eval())
+      self.assertEqual((), pmf.get_shape())
+
+  def testLargeTauGivesPreciseProbabilities(self):
+    # If tau is large, we are doing coin flips with probability mu.
+    mu = np.array([0.1, 0.1, 0.8], dtype=np.float32)
+    tau = np.array([100.], dtype=np.float32)
+    alpha = tau * mu
+
+    # One (three sided) coin flip.  Prob[coin 3] = 0.8.
+    # Note that since it was one flip, value of tau didn't matter.
+    counts = [0., 0, 1]
+    with self.test_session():
+      dist = ds.DirichletMultinomial(1., alpha)
+      pmf = dist.prob(counts)
+      self.assertAllClose(0.8, pmf.eval(), atol=1e-4)
+      self.assertEqual((), pmf.get_shape())
+
+    # Two (three sided) coin flips.  Prob[coin 3] = 0.8.
+    counts = [0., 0, 2]
+    with self.test_session():
+      dist = ds.DirichletMultinomial(2., alpha)
+      pmf = dist.prob(counts)
+      self.assertAllClose(0.8**2, pmf.eval(), atol=1e-2)
+      self.assertEqual((), pmf.get_shape())
+
+    # Three (three sided) coin flips.
+    counts = [1., 0, 2]
+    with self.test_session():
+      dist = ds.DirichletMultinomial(3., alpha)
+      pmf = dist.prob(counts)
+      self.assertAllClose(3 * 0.1 * 0.8 * 0.8, pmf.eval(), atol=1e-2)
+      self.assertEqual((), pmf.get_shape())
+
+  def testSmallTauPrefersCorrelatedResults(self):
+    # If tau is small, then correlation between draws is large, so draws that
+    # are both of the same class are more likely.
+    mu = np.array([0.5, 0.5], dtype=np.float32)
+    tau = np.array([0.1], dtype=np.float32)
+    alpha = tau * mu
+
+    # If there is only one draw, it is still a coin flip, even with small tau.
+    counts = [1., 0]
+    with self.test_session():
+      dist = ds.DirichletMultinomial(1., alpha)
+      pmf = dist.prob(counts)
+      self.assertAllClose(0.5, pmf.eval())
+      self.assertEqual((), pmf.get_shape())
+
+    # If there are two draws, it is much more likely that they are the same.
+    counts_same = [2., 0]
+    counts_different = [1, 1.]
+    with self.test_session():
+      dist = ds.DirichletMultinomial(2., alpha)
+      pmf_same = dist.prob(counts_same)
+      pmf_different = dist.prob(counts_different)
+      self.assertLess(5 * pmf_different.eval(), pmf_same.eval())
+      self.assertEqual((), pmf_same.get_shape())
+
+  def testNonStrictTurnsOffAllChecks(self):
+    # Make totally invalid input.
+    with self.test_session():
+      alpha = [[-1., 2]]  # alpha should be positive.
+      counts = [[1., 0], [0., -1]]  # counts should be non-negative.
+      n = [-5.3]  # n should be a non negative integer equal to counts.sum.
+      dist = ds.DirichletMultinomial(n, alpha, validate_args=False)
+      dist.prob(counts).eval()  # Should not raise.
+
+  def testSampleUnbiasedNonScalarBatch(self):
+    with self.test_session() as sess:
+      dist = ds.DirichletMultinomial(
+          total_count=5.,
+          concentration=1. + 2. * self._rng.rand(4, 3, 2).astype(np.float32))
+      n = int(3e3)
+      x = dist.sample(n, seed=0)
+      sample_mean = math_ops.reduce_mean(x, 0)
+      # Cyclically rotate event dims left.
+      x_centered = array_ops.transpose(x - sample_mean, [1, 2, 3, 0])
+      sample_covariance = math_ops.matmul(
+          x_centered, x_centered, adjoint_b=True) / n
+      [
+          sample_mean_,
+          sample_covariance_,
+          actual_mean_,
+          actual_covariance_,
+      ] = sess.run([
+          sample_mean,
+          sample_covariance,
+          dist.mean(),
+          dist.covariance(),
+      ])
+      self.assertAllEqual([4, 3, 2], sample_mean.get_shape())
+      self.assertAllClose(actual_mean_, sample_mean_, atol=0., rtol=0.15)
+      self.assertAllEqual([4, 3, 2, 2], sample_covariance.get_shape())
+      self.assertAllClose(
+          actual_covariance_, sample_covariance_, atol=0., rtol=0.20)
+
+  def testSampleUnbiasedScalarBatch(self):
+    with self.test_session() as sess:
+      dist = ds.DirichletMultinomial(
+          total_count=5.,
+          concentration=1. + 2. * self._rng.rand(4).astype(np.float32))
+      n = int(5e3)
+      x = dist.sample(n, seed=0)
+      sample_mean = math_ops.reduce_mean(x, 0)
+      x_centered = x - sample_mean  # Already transposed to [n, 2].
+      sample_covariance = math_ops.matmul(
+          x_centered, x_centered, adjoint_a=True) / n
+      [
+          sample_mean_,
+          sample_covariance_,
+          actual_mean_,
+          actual_covariance_,
+      ] = sess.run([
+          sample_mean,
+          sample_covariance,
+          dist.mean(),
+          dist.covariance(),
+      ])
+      self.assertAllEqual([4], sample_mean.get_shape())
+      self.assertAllClose(actual_mean_, sample_mean_, atol=0., rtol=0.05)
+      self.assertAllEqual([4, 4], sample_covariance.get_shape())
+      self.assertAllClose(
+          actual_covariance_, sample_covariance_, atol=0., rtol=0.15)
+
+
+if __name__ == "__main__":
+  test.main()
diff --git a/tensorflow/python/kernel_tests/distributions/dirichlet_test.py b/tensorflow/python/kernel_tests/distributions/dirichlet_test.py
new file mode 100644
index 0000000000..a2f1de5aaf
--- /dev/null
+++ b/tensorflow/python/kernel_tests/distributions/dirichlet_test.py
@@ -0,0 +1,263 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import importlib
+
+import numpy as np
+
+from tensorflow.python.framework import constant_op
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops.distributions import dirichlet as dirichlet_lib
+from tensorflow.python.platform import test
+from tensorflow.python.platform import tf_logging
+
+
+def try_import(name):  # pylint: disable=invalid-name
+  module = None
+  try:
+    module = importlib.import_module(name)
+  except ImportError as e:
+    tf_logging.warning("Could not import %s: %s" % (name, str(e)))
+  return module
+
+
+stats = try_import("scipy.stats")
+
+
+class DirichletTest(test.TestCase):
+
+  def testSimpleShapes(self):
+    with self.test_session():
+      alpha = np.random.rand(3)
+      dist = dirichlet_lib.Dirichlet(alpha)
+      self.assertEqual(3, dist.event_shape_tensor().eval())
+      self.assertAllEqual([], dist.batch_shape_tensor().eval())
+      self.assertEqual(tensor_shape.TensorShape([3]), dist.event_shape)
+      self.assertEqual(tensor_shape.TensorShape([]), dist.batch_shape)
+
+  def testComplexShapes(self):
+    with self.test_session():
+      alpha = np.random.rand(3, 2, 2)
+      dist = dirichlet_lib.Dirichlet(alpha)
+      self.assertEqual(2, dist.event_shape_tensor().eval())
+      self.assertAllEqual([3, 2], dist.batch_shape_tensor().eval())
+      self.assertEqual(tensor_shape.TensorShape([2]), dist.event_shape)
+      self.assertEqual(tensor_shape.TensorShape([3, 2]), dist.batch_shape)
+
+  def testConcentrationProperty(self):
+    alpha = [[1., 2, 3]]
+    with self.test_session():
+      dist = dirichlet_lib.Dirichlet(alpha)
+      self.assertEqual([1, 3], dist.concentration.get_shape())
+      self.assertAllClose(alpha, dist.concentration.eval())
+
+  def testPdfXProper(self):
+    alpha = [[1., 2, 3]]
+    with self.test_session():
+      dist = dirichlet_lib.Dirichlet(alpha, validate_args=True)
+      dist.prob([.1, .3, .6]).eval()
+      dist.prob([.2, .3, .5]).eval()
+      # Either condition can trigger.
+      with self.assertRaisesOpError("samples must be positive"):
+        dist.prob([-1., 1.5, 0.5]).eval()
+      with self.assertRaisesOpError("samples must be positive"):
+        dist.prob([0., .1, .9]).eval()
+      with self.assertRaisesOpError(
+          "sample last-dimension must sum to `1`"):
+        dist.prob([.1, .2, .8]).eval()
+
+  def testPdfZeroBatches(self):
+    with self.test_session():
+      alpha = [1., 2]
+      x = [.5, .5]
+      dist = dirichlet_lib.Dirichlet(alpha)
+      pdf = dist.prob(x)
+      self.assertAllClose(1., pdf.eval())
+      self.assertEqual((), pdf.get_shape())
+
+  def testPdfZeroBatchesNontrivialX(self):
+    with self.test_session():
+      alpha = [1., 2]
+      x = [.3, .7]
+      dist = dirichlet_lib.Dirichlet(alpha)
+      pdf = dist.prob(x)
+      self.assertAllClose(7. / 5, pdf.eval())
+      self.assertEqual((), pdf.get_shape())
+
+  def testPdfUniformZeroBatches(self):
+    with self.test_session():
+      # Corresponds to a uniform distribution
+      alpha = [1., 1, 1]
+      x = [[.2, .5, .3], [.3, .4, .3]]
+      dist = dirichlet_lib.Dirichlet(alpha)
+      pdf = dist.prob(x)
+      self.assertAllClose([2., 2.], pdf.eval())
+      self.assertEqual((2), pdf.get_shape())
+
+  def testPdfAlphaStretchedInBroadcastWhenSameRank(self):
+    with self.test_session():
+      alpha = [[1., 2]]
+      x = [[.5, .5], [.3, .7]]
+      dist = dirichlet_lib.Dirichlet(alpha)
+      pdf = dist.prob(x)
+      self.assertAllClose([1., 7. / 5], pdf.eval())
+      self.assertEqual((2), pdf.get_shape())
+
+  def testPdfAlphaStretchedInBroadcastWhenLowerRank(self):
+    with self.test_session():
+      alpha = [1., 2]
+      x = [[.5, .5], [.2, .8]]
+      pdf = dirichlet_lib.Dirichlet(alpha).prob(x)
+      self.assertAllClose([1., 8. / 5], pdf.eval())
+      self.assertEqual((2), pdf.get_shape())
+
+  def testPdfXStretchedInBroadcastWhenSameRank(self):
+    with self.test_session():
+      alpha = [[1., 2], [2., 3]]
+      x = [[.5, .5]]
+      pdf = dirichlet_lib.Dirichlet(alpha).prob(x)
+      self.assertAllClose([1., 3. / 2], pdf.eval())
+      self.assertEqual((2), pdf.get_shape())
+
+  def testPdfXStretchedInBroadcastWhenLowerRank(self):
+    with self.test_session():
+      alpha = [[1., 2], [2., 3]]
+      x = [.5, .5]
+      pdf = dirichlet_lib.Dirichlet(alpha).prob(x)
+      self.assertAllClose([1., 3. / 2], pdf.eval())
+      self.assertEqual((2), pdf.get_shape())
+
+  def testMean(self):
+    with self.test_session():
+      alpha = [1., 2, 3]
+      dirichlet = dirichlet_lib.Dirichlet(concentration=alpha)
+      self.assertEqual(dirichlet.mean().get_shape(), [3])
+      if not stats:
+        return
+      expected_mean = stats.dirichlet.mean(alpha)
+      self.assertAllClose(dirichlet.mean().eval(), expected_mean)
+
+  def testCovarianceFromSampling(self):
+    alpha = np.array([[1., 2, 3],
+                      [2.5, 4, 0.01]], dtype=np.float32)
+    with self.test_session() as sess:
+      dist = dirichlet_lib.Dirichlet(alpha)  # batch_shape=[2], event_shape=[3]
+      x = dist.sample(int(250e3), seed=1)
+      sample_mean = math_ops.reduce_mean(x, 0)
+      x_centered = x - sample_mean[None, ...]
+      sample_cov = math_ops.reduce_mean(math_ops.matmul(
+          x_centered[..., None], x_centered[..., None, :]), 0)
+      sample_var = array_ops.matrix_diag_part(sample_cov)
+      sample_stddev = math_ops.sqrt(sample_var)
+      [
+          sample_mean_,
+          sample_cov_,
+          sample_var_,
+          sample_stddev_,
+          analytic_mean,
+          analytic_cov,
+          analytic_var,
+          analytic_stddev,
+      ] = sess.run([
+          sample_mean,
+          sample_cov,
+          sample_var,
+          sample_stddev,
+          dist.mean(),
+          dist.covariance(),
+          dist.variance(),
+          dist.stddev(),
+      ])
+      self.assertAllClose(sample_mean_, analytic_mean, atol=0., rtol=0.04)
+      self.assertAllClose(sample_cov_, analytic_cov, atol=0., rtol=0.06)
+      self.assertAllClose(sample_var_, analytic_var, atol=0., rtol=0.03)
+      self.assertAllClose(sample_stddev_, analytic_stddev, atol=0., rtol=0.02)
+
+  def testVariance(self):
+    with self.test_session():
+      alpha = [1., 2, 3]
+      denominator = np.sum(alpha)**2 * (np.sum(alpha) + 1)
+      dirichlet = dirichlet_lib.Dirichlet(concentration=alpha)
+      self.assertEqual(dirichlet.covariance().get_shape(), (3, 3))
+      if not stats:
+        return
+      expected_covariance = np.diag(stats.dirichlet.var(alpha))
+      expected_covariance += [[0., -2, -3], [-2, 0, -6],
+                              [-3, -6, 0]] / denominator
+      self.assertAllClose(dirichlet.covariance().eval(), expected_covariance)
+
+  def testMode(self):
+    with self.test_session():
+      alpha = np.array([1.1, 2, 3])
+      expected_mode = (alpha - 1) / (np.sum(alpha) - 3)
+      dirichlet = dirichlet_lib.Dirichlet(concentration=alpha)
+      self.assertEqual(dirichlet.mode().get_shape(), [3])
+      self.assertAllClose(dirichlet.mode().eval(), expected_mode)
+
+  def testModeInvalid(self):
+    with self.test_session():
+      alpha = np.array([1., 2, 3])
+      dirichlet = dirichlet_lib.Dirichlet(concentration=alpha,
+                                          allow_nan_stats=False)
+      with self.assertRaisesOpError("Condition x < y.*"):
+        dirichlet.mode().eval()
+
+  def testModeEnableAllowNanStats(self):
+    with self.test_session():
+      alpha = np.array([1., 2, 3])
+      dirichlet = dirichlet_lib.Dirichlet(concentration=alpha,
+                                          allow_nan_stats=True)
+      expected_mode = np.zeros_like(alpha) + np.nan
+
+      self.assertEqual(dirichlet.mode().get_shape(), [3])
+      self.assertAllClose(dirichlet.mode().eval(), expected_mode)
+
+  def testEntropy(self):
+    with self.test_session():
+      alpha = [1., 2, 3]
+      dirichlet = dirichlet_lib.Dirichlet(concentration=alpha)
+      self.assertEqual(dirichlet.entropy().get_shape(), ())
+      if not stats:
+        return
+      expected_entropy = stats.dirichlet.entropy(alpha)
+      self.assertAllClose(dirichlet.entropy().eval(), expected_entropy)
+
+  def testSample(self):
+    with self.test_session():
+      alpha = [1., 2]
+      dirichlet = dirichlet_lib.Dirichlet(alpha)
+      n = constant_op.constant(100000)
+      samples = dirichlet.sample(n)
+      sample_values = samples.eval()
+      self.assertEqual(sample_values.shape, (100000, 2))
+      self.assertTrue(np.all(sample_values > 0.0))
+      if not stats:
+        return
+      self.assertLess(
+          stats.kstest(
+              # Beta is a univariate distribution.
+              sample_values[:, 0],
+              stats.beta(
+                  a=1., b=2.).cdf)[0],
+          0.01)
+
+
+if __name__ == "__main__":
+  test.main()
diff --git a/tensorflow/python/kernel_tests/distributions/exponential_test.py b/tensorflow/python/kernel_tests/distributions/exponential_test.py
new file mode 100644
index 0000000000..7afdf0f947
--- /dev/null
+++ b/tensorflow/python/kernel_tests/distributions/exponential_test.py
@@ -0,0 +1,171 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for initializers."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import importlib
+
+import numpy as np
+
+from tensorflow.python.client import session
+from tensorflow.python.framework import constant_op
+from tensorflow.python.ops import nn_ops
+from tensorflow.python.ops.distributions import exponential as exponential_lib
+from tensorflow.python.platform import test
+from tensorflow.python.platform import tf_logging
+
+
+def try_import(name):  # pylint: disable=invalid-name
+  module = None
+  try:
+    module = importlib.import_module(name)
+  except ImportError as e:
+    tf_logging.warning("Could not import %s: %s" % (name, str(e)))
+  return module
+
+
+stats = try_import("scipy.stats")
+
+
+class ExponentialTest(test.TestCase):
+
+  def testExponentialLogPDF(self):
+    with session.Session():
+      batch_size = 6
+      lam = constant_op.constant([2.0] * batch_size)
+      lam_v = 2.0
+      x = np.array([2.5, 2.5, 4.0, 0.1, 1.0, 2.0], dtype=np.float32)
+      exponential = exponential_lib.Exponential(rate=lam)
+
+      log_pdf = exponential.log_prob(x)
+      self.assertEqual(log_pdf.get_shape(), (6,))
+
+      pdf = exponential.prob(x)
+      self.assertEqual(pdf.get_shape(), (6,))
+
+      if not stats:
+        return
+      expected_log_pdf = stats.expon.logpdf(x, scale=1 / lam_v)
+      self.assertAllClose(log_pdf.eval(), expected_log_pdf)
+      self.assertAllClose(pdf.eval(), np.exp(expected_log_pdf))
+
+  def testExponentialCDF(self):
+    with session.Session():
+      batch_size = 6
+      lam = constant_op.constant([2.0] * batch_size)
+      lam_v = 2.0
+      x = np.array([2.5, 2.5, 4.0, 0.1, 1.0, 2.0], dtype=np.float32)
+
+      exponential = exponential_lib.Exponential(rate=lam)
+
+      cdf = exponential.cdf(x)
+      self.assertEqual(cdf.get_shape(), (6,))
+
+      if not stats:
+        return
+      expected_cdf = stats.expon.cdf(x, scale=1 / lam_v)
+      self.assertAllClose(cdf.eval(), expected_cdf)
+
+  def testExponentialMean(self):
+    with session.Session():
+      lam_v = np.array([1.0, 4.0, 2.5])
+      exponential = exponential_lib.Exponential(rate=lam_v)
+      self.assertEqual(exponential.mean().get_shape(), (3,))
+      if not stats:
+        return
+      expected_mean = stats.expon.mean(scale=1 / lam_v)
+      self.assertAllClose(exponential.mean().eval(), expected_mean)
+
+  def testExponentialVariance(self):
+    with session.Session():
+      lam_v = np.array([1.0, 4.0, 2.5])
+      exponential = exponential_lib.Exponential(rate=lam_v)
+      self.assertEqual(exponential.variance().get_shape(), (3,))
+      if not stats:
+        return
+      expected_variance = stats.expon.var(scale=1 / lam_v)
+      self.assertAllClose(exponential.variance().eval(), expected_variance)
+
+  def testExponentialEntropy(self):
+    with session.Session():
+      lam_v = np.array([1.0, 4.0, 2.5])
+      exponential = exponential_lib.Exponential(rate=lam_v)
+      self.assertEqual(exponential.entropy().get_shape(), (3,))
+      if not stats:
+        return
+      expected_entropy = stats.expon.entropy(scale=1 / lam_v)
+      self.assertAllClose(exponential.entropy().eval(), expected_entropy)
+
+  def testExponentialSample(self):
+    with self.test_session():
+      lam = constant_op.constant([3.0, 4.0])
+      lam_v = [3.0, 4.0]
+      n = constant_op.constant(100000)
+      exponential = exponential_lib.Exponential(rate=lam)
+
+      samples = exponential.sample(n, seed=137)
+      sample_values = samples.eval()
+      self.assertEqual(sample_values.shape, (100000, 2))
+      self.assertFalse(np.any(sample_values < 0.0))
+      if not stats:
+        return
+      for i in range(2):
+        self.assertLess(
+            stats.kstest(
+                sample_values[:, i], stats.expon(scale=1.0 / lam_v[i]).cdf)[0],
+            0.01)
+
+  def testExponentialSampleMultiDimensional(self):
+    with self.test_session():
+      batch_size = 2
+      lam_v = [3.0, 22.0]
+      lam = constant_op.constant([lam_v] * batch_size)
+
+      exponential = exponential_lib.Exponential(rate=lam)
+
+      n = 100000
+      samples = exponential.sample(n, seed=138)
+      self.assertEqual(samples.get_shape(), (n, batch_size, 2))
+
+      sample_values = samples.eval()
+
+      self.assertFalse(np.any(sample_values < 0.0))
+      if not stats:
+        return
+      for i in range(2):
+        self.assertLess(
+            stats.kstest(
+                sample_values[:, 0, i],
+                stats.expon(scale=1.0 / lam_v[i]).cdf)[0],
+            0.01)
+        self.assertLess(
+            stats.kstest(
+                sample_values[:, 1, i],
+                stats.expon(scale=1.0 / lam_v[i]).cdf)[0],
+            0.01)
+
+  def testExponentialWithSoftplusRate(self):
+    with self.test_session():
+      lam = [-2.2, -3.4]
+      exponential = exponential_lib.ExponentialWithSoftplusRate(rate=lam)
+      self.assertAllClose(nn_ops.softplus(lam).eval(),
+                          exponential.rate.eval())
+
+
+if __name__ == "__main__":
+  test.main()
diff --git a/tensorflow/python/kernel_tests/distributions/gamma_test.py b/tensorflow/python/kernel_tests/distributions/gamma_test.py
new file mode 100644
index 0000000000..5e4813ac07
--- /dev/null
+++ b/tensorflow/python/kernel_tests/distributions/gamma_test.py
@@ -0,0 +1,406 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import importlib
+
+import numpy as np
+
+from tensorflow.python.client import session
+from tensorflow.python.framework import constant_op
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import nn_ops
+from tensorflow.python.ops.distributions import gamma as gamma_lib
+from tensorflow.python.ops.distributions import kullback_leibler
+from tensorflow.python.platform import test
+from tensorflow.python.platform import tf_logging
+
+
+def try_import(name):  # pylint: disable=invalid-name
+  module = None
+  try:
+    module = importlib.import_module(name)
+  except ImportError as e:
+    tf_logging.warning("Could not import %s: %s" % (name, str(e)))
+  return module
+
+
+special = try_import("scipy.special")
+stats = try_import("scipy.stats")
+
+
+class GammaTest(test.TestCase):
+
+  def testGammaShape(self):
+    with self.test_session():
+      alpha = constant_op.constant([3.0] * 5)
+      beta = constant_op.constant(11.0)
+      gamma = gamma_lib.Gamma(concentration=alpha, rate=beta)
+
+      self.assertEqual(gamma.batch_shape_tensor().eval(), (5,))
+      self.assertEqual(gamma.batch_shape, tensor_shape.TensorShape([5]))
+      self.assertAllEqual(gamma.event_shape_tensor().eval(), [])
+      self.assertEqual(gamma.event_shape, tensor_shape.TensorShape([]))
+
+  def testGammaLogPDF(self):
+    with self.test_session():
+      batch_size = 6
+      alpha = constant_op.constant([2.0] * batch_size)
+      beta = constant_op.constant([3.0] * batch_size)
+      alpha_v = 2.0
+      beta_v = 3.0
+      x = np.array([2.5, 2.5, 4.0, 0.1, 1.0, 2.0], dtype=np.float32)
+      gamma = gamma_lib.Gamma(concentration=alpha, rate=beta)
+      log_pdf = gamma.log_prob(x)
+      self.assertEqual(log_pdf.get_shape(), (6,))
+      pdf = gamma.prob(x)
+      self.assertEqual(pdf.get_shape(), (6,))
+      if not stats:
+        return
+      expected_log_pdf = stats.gamma.logpdf(x, alpha_v, scale=1 / beta_v)
+      self.assertAllClose(log_pdf.eval(), expected_log_pdf)
+      self.assertAllClose(pdf.eval(), np.exp(expected_log_pdf))
+
+  def testGammaLogPDFMultidimensional(self):
+    with self.test_session():
+      batch_size = 6
+      alpha = constant_op.constant([[2.0, 4.0]] * batch_size)
+      beta = constant_op.constant([[3.0, 4.0]] * batch_size)
+      alpha_v = np.array([2.0, 4.0])
+      beta_v = np.array([3.0, 4.0])
+      x = np.array([[2.5, 2.5, 4.0, 0.1, 1.0, 2.0]], dtype=np.float32).T
+      gamma = gamma_lib.Gamma(concentration=alpha, rate=beta)
+      log_pdf = gamma.log_prob(x)
+      log_pdf_values = log_pdf.eval()
+      self.assertEqual(log_pdf.get_shape(), (6, 2))
+      pdf = gamma.prob(x)
+      pdf_values = pdf.eval()
+      self.assertEqual(pdf.get_shape(), (6, 2))
+      if not stats:
+        return
+      expected_log_pdf = stats.gamma.logpdf(x, alpha_v, scale=1 / beta_v)
+      self.assertAllClose(log_pdf_values, expected_log_pdf)
+      self.assertAllClose(pdf_values, np.exp(expected_log_pdf))
+
+  def testGammaLogPDFMultidimensionalBroadcasting(self):
+    with self.test_session():
+      batch_size = 6
+      alpha = constant_op.constant([[2.0, 4.0]] * batch_size)
+      beta = constant_op.constant(3.0)
+      alpha_v = np.array([2.0, 4.0])
+      beta_v = 3.0
+      x = np.array([[2.5, 2.5, 4.0, 0.1, 1.0, 2.0]], dtype=np.float32).T
+      gamma = gamma_lib.Gamma(concentration=alpha, rate=beta)
+      log_pdf = gamma.log_prob(x)
+      log_pdf_values = log_pdf.eval()
+      self.assertEqual(log_pdf.get_shape(), (6, 2))
+      pdf = gamma.prob(x)
+      pdf_values = pdf.eval()
+      self.assertEqual(pdf.get_shape(), (6, 2))
+
+      if not stats:
+        return
+      expected_log_pdf = stats.gamma.logpdf(x, alpha_v, scale=1 / beta_v)
+      self.assertAllClose(log_pdf_values, expected_log_pdf)
+      self.assertAllClose(pdf_values, np.exp(expected_log_pdf))
+
+  def testGammaCDF(self):
+    with self.test_session():
+      batch_size = 6
+      alpha = constant_op.constant([2.0] * batch_size)
+      beta = constant_op.constant([3.0] * batch_size)
+      alpha_v = 2.0
+      beta_v = 3.0
+      x = np.array([2.5, 2.5, 4.0, 0.1, 1.0, 2.0], dtype=np.float32)
+
+      gamma = gamma_lib.Gamma(concentration=alpha, rate=beta)
+      cdf = gamma.cdf(x)
+      self.assertEqual(cdf.get_shape(), (6,))
+      if not stats:
+        return
+      expected_cdf = stats.gamma.cdf(x, alpha_v, scale=1 / beta_v)
+      self.assertAllClose(cdf.eval(), expected_cdf)
+
+  def testGammaMean(self):
+    with self.test_session():
+      alpha_v = np.array([1.0, 3.0, 2.5])
+      beta_v = np.array([1.0, 4.0, 5.0])
+      gamma = gamma_lib.Gamma(concentration=alpha_v, rate=beta_v)
+      self.assertEqual(gamma.mean().get_shape(), (3,))
+      if not stats:
+        return
+      expected_means = stats.gamma.mean(alpha_v, scale=1 / beta_v)
+      self.assertAllClose(gamma.mean().eval(), expected_means)
+
+  def testGammaModeAllowNanStatsIsFalseWorksWhenAllBatchMembersAreDefined(self):
+    with self.test_session():
+      alpha_v = np.array([5.5, 3.0, 2.5])
+      beta_v = np.array([1.0, 4.0, 5.0])
+      gamma = gamma_lib.Gamma(concentration=alpha_v, rate=beta_v)
+      expected_modes = (alpha_v - 1) / beta_v
+      self.assertEqual(gamma.mode().get_shape(), (3,))
+      self.assertAllClose(gamma.mode().eval(), expected_modes)
+
+  def testGammaModeAllowNanStatsFalseRaisesForUndefinedBatchMembers(self):
+    with self.test_session():
+      # Mode will not be defined for the first entry.
+      alpha_v = np.array([0.5, 3.0, 2.5])
+      beta_v = np.array([1.0, 4.0, 5.0])
+      gamma = gamma_lib.Gamma(concentration=alpha_v,
+                              rate=beta_v,
+                              allow_nan_stats=False)
+      with self.assertRaisesOpError("x < y"):
+        gamma.mode().eval()
+
+  def testGammaModeAllowNanStatsIsTrueReturnsNaNforUndefinedBatchMembers(self):
+    with self.test_session():
+      # Mode will not be defined for the first entry.
+      alpha_v = np.array([0.5, 3.0, 2.5])
+      beta_v = np.array([1.0, 4.0, 5.0])
+      gamma = gamma_lib.Gamma(concentration=alpha_v,
+                              rate=beta_v,
+                              allow_nan_stats=True)
+      expected_modes = (alpha_v - 1) / beta_v
+      expected_modes[0] = np.nan
+      self.assertEqual(gamma.mode().get_shape(), (3,))
+      self.assertAllClose(gamma.mode().eval(), expected_modes)
+
+  def testGammaVariance(self):
+    with self.test_session():
+      alpha_v = np.array([1.0, 3.0, 2.5])
+      beta_v = np.array([1.0, 4.0, 5.0])
+      gamma = gamma_lib.Gamma(concentration=alpha_v, rate=beta_v)
+      self.assertEqual(gamma.variance().get_shape(), (3,))
+      if not stats:
+        return
+      expected_variances = stats.gamma.var(alpha_v, scale=1 / beta_v)
+      self.assertAllClose(gamma.variance().eval(), expected_variances)
+
+  def testGammaStd(self):
+    with self.test_session():
+      alpha_v = np.array([1.0, 3.0, 2.5])
+      beta_v = np.array([1.0, 4.0, 5.0])
+      gamma = gamma_lib.Gamma(concentration=alpha_v, rate=beta_v)
+      self.assertEqual(gamma.stddev().get_shape(), (3,))
+      if not stats:
+        return
+      expected_stddev = stats.gamma.std(alpha_v, scale=1. / beta_v)
+      self.assertAllClose(gamma.stddev().eval(), expected_stddev)
+
+  def testGammaEntropy(self):
+    with self.test_session():
+      alpha_v = np.array([1.0, 3.0, 2.5])
+      beta_v = np.array([1.0, 4.0, 5.0])
+      gamma = gamma_lib.Gamma(concentration=alpha_v, rate=beta_v)
+      self.assertEqual(gamma.entropy().get_shape(), (3,))
+      if not stats:
+        return
+      expected_entropy = stats.gamma.entropy(alpha_v, scale=1 / beta_v)
+      self.assertAllClose(gamma.entropy().eval(), expected_entropy)
+
+  def testGammaSampleSmallAlpha(self):
+    with session.Session():
+      alpha_v = 0.05
+      beta_v = 1.0
+      alpha = constant_op.constant(alpha_v)
+      beta = constant_op.constant(beta_v)
+      n = 100000
+      gamma = gamma_lib.Gamma(concentration=alpha, rate=beta)
+      samples = gamma.sample(n, seed=137)
+      sample_values = samples.eval()
+      self.assertEqual(samples.get_shape(), (n,))
+      self.assertEqual(sample_values.shape, (n,))
+      self.assertTrue(self._kstest(alpha_v, beta_v, sample_values))
+      if not stats:
+        return
+      self.assertAllClose(
+          sample_values.mean(),
+          stats.gamma.mean(
+              alpha_v, scale=1 / beta_v),
+          atol=.01)
+      self.assertAllClose(
+          sample_values.var(),
+          stats.gamma.var(alpha_v, scale=1 / beta_v),
+          atol=.15)
+
+  def testGammaSample(self):
+    with session.Session():
+      alpha_v = 4.0
+      beta_v = 3.0
+      alpha = constant_op.constant(alpha_v)
+      beta = constant_op.constant(beta_v)
+      n = 100000
+      gamma = gamma_lib.Gamma(concentration=alpha, rate=beta)
+      samples = gamma.sample(n, seed=137)
+      sample_values = samples.eval()
+      self.assertEqual(samples.get_shape(), (n,))
+      self.assertEqual(sample_values.shape, (n,))
+      self.assertTrue(self._kstest(alpha_v, beta_v, sample_values))
+      if not stats:
+        return
+      self.assertAllClose(
+          sample_values.mean(),
+          stats.gamma.mean(
+              alpha_v, scale=1 / beta_v),
+          atol=.01)
+      self.assertAllClose(
+          sample_values.var(),
+          stats.gamma.var(alpha_v, scale=1 / beta_v),
+          atol=.15)
+
+  def testGammaSampleMultiDimensional(self):
+    with session.Session():
+      alpha_v = np.array([np.arange(1, 101, dtype=np.float32)])  # 1 x 100
+      beta_v = np.array([np.arange(1, 11, dtype=np.float32)]).T  # 10 x 1
+      gamma = gamma_lib.Gamma(concentration=alpha_v, rate=beta_v)
+      n = 10000
+      samples = gamma.sample(n, seed=137)
+      sample_values = samples.eval()
+      self.assertEqual(samples.get_shape(), (n, 10, 100))
+      self.assertEqual(sample_values.shape, (n, 10, 100))
+      zeros = np.zeros_like(alpha_v + beta_v)  # 10 x 100
+      alpha_bc = alpha_v + zeros
+      beta_bc = beta_v + zeros
+      if not stats:
+        return
+      self.assertAllClose(
+          sample_values.mean(axis=0),
+          stats.gamma.mean(
+              alpha_bc, scale=1 / beta_bc),
+          rtol=.035)
+      self.assertAllClose(
+          sample_values.var(axis=0),
+          stats.gamma.var(alpha_bc, scale=1 / beta_bc),
+          atol=4.5)
+      fails = 0
+      trials = 0
+      for ai, a in enumerate(np.reshape(alpha_v, [-1])):
+        for bi, b in enumerate(np.reshape(beta_v, [-1])):
+          s = sample_values[:, bi, ai]
+          trials += 1
+          fails += 0 if self._kstest(a, b, s) else 1
+      self.assertLess(fails, trials * 0.03)
+
+  def _kstest(self, alpha, beta, samples):
+    # Uses the Kolmogorov-Smirnov test for goodness of fit.
+    if not stats:
+      return True  # If we can't test, return that the test passes.
+    ks, _ = stats.kstest(samples, stats.gamma(alpha, scale=1 / beta).cdf)
+    # Return True when the test passes.
+    return ks < 0.02
+
+  def testGammaPdfOfSampleMultiDims(self):
+    with session.Session() as sess:
+      gamma = gamma_lib.Gamma(concentration=[7., 11.], rate=[[5.], [6.]])
+      num = 50000
+      samples = gamma.sample(num, seed=137)
+      pdfs = gamma.prob(samples)
+      sample_vals, pdf_vals = sess.run([samples, pdfs])
+      self.assertEqual(samples.get_shape(), (num, 2, 2))
+      self.assertEqual(pdfs.get_shape(), (num, 2, 2))
+      self._assertIntegral(sample_vals[:, 0, 0], pdf_vals[:, 0, 0], err=0.02)
+      self._assertIntegral(sample_vals[:, 0, 1], pdf_vals[:, 0, 1], err=0.02)
+      self._assertIntegral(sample_vals[:, 1, 0], pdf_vals[:, 1, 0], err=0.02)
+      self._assertIntegral(sample_vals[:, 1, 1], pdf_vals[:, 1, 1], err=0.02)
+      if not stats:
+        return
+      self.assertAllClose(
+          stats.gamma.mean(
+              [[7., 11.], [7., 11.]], scale=1 / np.array([[5., 5.], [6., 6.]])),
+          sample_vals.mean(axis=0),
+          atol=.1)
+      self.assertAllClose(
+          stats.gamma.var([[7., 11.], [7., 11.]],
+                          scale=1 / np.array([[5., 5.], [6., 6.]])),
+          sample_vals.var(axis=0),
+          atol=.1)
+
+  def _assertIntegral(self, sample_vals, pdf_vals, err=1e-3):
+    s_p = zip(sample_vals, pdf_vals)
+    prev = (0, 0)
+    total = 0
+    for k in sorted(s_p, key=lambda x: x[0]):
+      pair_pdf = (k[1] + prev[1]) / 2
+      total += (k[0] - prev[0]) * pair_pdf
+      prev = k
+    self.assertNear(1., total, err=err)
+
+  def testGammaNonPositiveInitializationParamsRaises(self):
+    with self.test_session():
+      alpha_v = constant_op.constant(0.0, name="alpha")
+      beta_v = constant_op.constant(1.0, name="beta")
+      gamma = gamma_lib.Gamma(concentration=alpha_v,
+                              rate=beta_v,
+                              validate_args=True)
+      with self.assertRaisesOpError("alpha"):
+        gamma.mean().eval()
+      alpha_v = constant_op.constant(1.0, name="alpha")
+      beta_v = constant_op.constant(0.0, name="beta")
+      gamma = gamma_lib.Gamma(concentration=alpha_v,
+                              rate=beta_v,
+                              validate_args=True)
+      with self.assertRaisesOpError("beta"):
+        gamma.mean().eval()
+
+  def testGammaWithSoftplusConcentrationRate(self):
+    with self.test_session():
+      alpha_v = constant_op.constant([0.0, -2.1], name="alpha")
+      beta_v = constant_op.constant([1.0, -3.6], name="beta")
+      gamma = gamma_lib.GammaWithSoftplusConcentrationRate(
+          concentration=alpha_v, rate=beta_v)
+      self.assertAllEqual(nn_ops.softplus(alpha_v).eval(),
+                          gamma.concentration.eval())
+      self.assertAllEqual(nn_ops.softplus(beta_v).eval(),
+                          gamma.rate.eval())
+
+  def testGammaGammaKL(self):
+    alpha0 = np.array([3.])
+    beta0 = np.array([1., 2., 3., 1.5, 2.5, 3.5])
+
+    alpha1 = np.array([0.4])
+    beta1 = np.array([0.5, 1., 1.5, 2., 2.5, 3.])
+
+    # Build graph.
+    with self.test_session() as sess:
+      g0 = gamma_lib.Gamma(concentration=alpha0, rate=beta0)
+      g1 = gamma_lib.Gamma(concentration=alpha1, rate=beta1)
+      x = g0.sample(int(1e4), seed=0)
+      kl_sample = math_ops.reduce_mean(g0.log_prob(x) - g1.log_prob(x), 0)
+      kl_actual = kullback_leibler.kl_divergence(g0, g1)
+
+    # Execute graph.
+    [kl_sample_, kl_actual_] = sess.run([kl_sample, kl_actual])
+
+    self.assertEqual(beta0.shape, kl_actual.get_shape())
+
+    if not special:
+      return
+    kl_expected = ((alpha0 - alpha1) * special.digamma(alpha0)
+                   + special.gammaln(alpha1)
+                   - special.gammaln(alpha0)
+                   + alpha1 * np.log(beta0)
+                   - alpha1 * np.log(beta1)
+                   + alpha0 * (beta1 / beta0 - 1.))
+
+    self.assertAllClose(kl_expected, kl_actual_, atol=0., rtol=1e-6)
+    self.assertAllClose(kl_sample_, kl_actual_, atol=0., rtol=1e-2)
+
+
+if __name__ == "__main__":
+  test.main()
diff --git a/tensorflow/python/kernel_tests/distributions/laplace_test.py b/tensorflow/python/kernel_tests/distributions/laplace_test.py
new file mode 100644
index 0000000000..55577386c4
--- /dev/null
+++ b/tensorflow/python/kernel_tests/distributions/laplace_test.py
@@ -0,0 +1,362 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import importlib
+
+import numpy as np
+
+from tensorflow.python.client import session
+from tensorflow.python.framework import constant_op
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.ops import nn_ops
+from tensorflow.python.ops.distributions import laplace as laplace_lib
+from tensorflow.python.platform import test
+
+from tensorflow.python.platform import tf_logging
+
+
+def try_import(name):  # pylint: disable=invalid-name
+  module = None
+  try:
+    module = importlib.import_module(name)
+  except ImportError as e:
+    tf_logging.warning("Could not import %s: %s" % (name, str(e)))
+  return module
+
+
+stats = try_import("scipy.stats")
+
+
+class LaplaceTest(test.TestCase):
+
+  def testLaplaceShape(self):
+    with self.test_session():
+      loc = constant_op.constant([3.0] * 5)
+      scale = constant_op.constant(11.0)
+      laplace = laplace_lib.Laplace(loc=loc, scale=scale)
+
+      self.assertEqual(laplace.batch_shape_tensor().eval(), (5,))
+      self.assertEqual(laplace.batch_shape, tensor_shape.TensorShape([5]))
+      self.assertAllEqual(laplace.event_shape_tensor().eval(), [])
+      self.assertEqual(laplace.event_shape, tensor_shape.TensorShape([]))
+
+  def testLaplaceLogPDF(self):
+    with self.test_session():
+      batch_size = 6
+      loc = constant_op.constant([2.0] * batch_size)
+      scale = constant_op.constant([3.0] * batch_size)
+      loc_v = 2.0
+      scale_v = 3.0
+      x = np.array([2.5, 2.5, 4.0, 0.1, 1.0, 2.0], dtype=np.float32)
+      laplace = laplace_lib.Laplace(loc=loc, scale=scale)
+      log_pdf = laplace.log_prob(x)
+      self.assertEqual(log_pdf.get_shape(), (6,))
+      if not stats:
+        return
+      expected_log_pdf = stats.laplace.logpdf(x, loc_v, scale=scale_v)
+      self.assertAllClose(log_pdf.eval(), expected_log_pdf)
+
+      pdf = laplace.prob(x)
+      self.assertEqual(pdf.get_shape(), (6,))
+      self.assertAllClose(pdf.eval(), np.exp(expected_log_pdf))
+
+  def testLaplaceLogPDFMultidimensional(self):
+    with self.test_session():
+      batch_size = 6
+      loc = constant_op.constant([[2.0, 4.0]] * batch_size)
+      scale = constant_op.constant([[3.0, 4.0]] * batch_size)
+      loc_v = np.array([2.0, 4.0])
+      scale_v = np.array([3.0, 4.0])
+      x = np.array([[2.5, 2.5, 4.0, 0.1, 1.0, 2.0]], dtype=np.float32).T
+      laplace = laplace_lib.Laplace(loc=loc, scale=scale)
+      log_pdf = laplace.log_prob(x)
+      log_pdf_values = log_pdf.eval()
+      self.assertEqual(log_pdf.get_shape(), (6, 2))
+
+      pdf = laplace.prob(x)
+      pdf_values = pdf.eval()
+      self.assertEqual(pdf.get_shape(), (6, 2))
+      if not stats:
+        return
+      expected_log_pdf = stats.laplace.logpdf(x, loc_v, scale=scale_v)
+      self.assertAllClose(log_pdf_values, expected_log_pdf)
+      self.assertAllClose(pdf_values, np.exp(expected_log_pdf))
+
+  def testLaplaceLogPDFMultidimensionalBroadcasting(self):
+    with self.test_session():
+      batch_size = 6
+      loc = constant_op.constant([[2.0, 4.0]] * batch_size)
+      scale = constant_op.constant(3.0)
+      loc_v = np.array([2.0, 4.0])
+      scale_v = 3.0
+      x = np.array([[2.5, 2.5, 4.0, 0.1, 1.0, 2.0]], dtype=np.float32).T
+      laplace = laplace_lib.Laplace(loc=loc, scale=scale)
+      log_pdf = laplace.log_prob(x)
+      log_pdf_values = log_pdf.eval()
+      self.assertEqual(log_pdf.get_shape(), (6, 2))
+
+      pdf = laplace.prob(x)
+      pdf_values = pdf.eval()
+      self.assertEqual(pdf.get_shape(), (6, 2))
+      if not stats:
+        return
+      expected_log_pdf = stats.laplace.logpdf(x, loc_v, scale=scale_v)
+      self.assertAllClose(log_pdf_values, expected_log_pdf)
+      self.assertAllClose(pdf_values, np.exp(expected_log_pdf))
+
+  def testLaplaceCDF(self):
+    with self.test_session():
+      batch_size = 6
+      loc = constant_op.constant([2.0] * batch_size)
+      scale = constant_op.constant([3.0] * batch_size)
+      loc_v = 2.0
+      scale_v = 3.0
+      x = np.array([2.5, 2.5, 4.0, 0.1, 1.0, 2.0], dtype=np.float32)
+
+      laplace = laplace_lib.Laplace(loc=loc, scale=scale)
+
+      cdf = laplace.cdf(x)
+      self.assertEqual(cdf.get_shape(), (6,))
+      if not stats:
+        return
+      expected_cdf = stats.laplace.cdf(x, loc_v, scale=scale_v)
+      self.assertAllClose(cdf.eval(), expected_cdf)
+
+  def testLaplaceLogCDF(self):
+    with self.test_session():
+      batch_size = 6
+      loc = constant_op.constant([2.0] * batch_size)
+      scale = constant_op.constant([3.0] * batch_size)
+      loc_v = 2.0
+      scale_v = 3.0
+      x = np.array([-2.5, 2.5, -4.0, 0.1, 1.0, 2.0], dtype=np.float32)
+
+      laplace = laplace_lib.Laplace(loc=loc, scale=scale)
+
+      cdf = laplace.log_cdf(x)
+      self.assertEqual(cdf.get_shape(), (6,))
+      if not stats:
+        return
+      expected_cdf = stats.laplace.logcdf(x, loc_v, scale=scale_v)
+      self.assertAllClose(cdf.eval(), expected_cdf)
+
+  def testLaplaceLogSurvivalFunction(self):
+    with self.test_session():
+      batch_size = 6
+      loc = constant_op.constant([2.0] * batch_size)
+      scale = constant_op.constant([3.0] * batch_size)
+      loc_v = 2.0
+      scale_v = 3.0
+      x = np.array([-2.5, 2.5, -4.0, 0.1, 1.0, 2.0], dtype=np.float32)
+
+      laplace = laplace_lib.Laplace(loc=loc, scale=scale)
+
+      sf = laplace.log_survival_function(x)
+      self.assertEqual(sf.get_shape(), (6,))
+      if not stats:
+        return
+      expected_sf = stats.laplace.logsf(x, loc_v, scale=scale_v)
+      self.assertAllClose(sf.eval(), expected_sf)
+
+  def testLaplaceMean(self):
+    with self.test_session():
+      loc_v = np.array([1.0, 3.0, 2.5])
+      scale_v = np.array([1.0, 4.0, 5.0])
+      laplace = laplace_lib.Laplace(loc=loc_v, scale=scale_v)
+      self.assertEqual(laplace.mean().get_shape(), (3,))
+      if not stats:
+        return
+      expected_means = stats.laplace.mean(loc_v, scale=scale_v)
+      self.assertAllClose(laplace.mean().eval(), expected_means)
+
+  def testLaplaceMode(self):
+    with self.test_session():
+      loc_v = np.array([0.5, 3.0, 2.5])
+      scale_v = np.array([1.0, 4.0, 5.0])
+      laplace = laplace_lib.Laplace(loc=loc_v, scale=scale_v)
+      self.assertEqual(laplace.mode().get_shape(), (3,))
+      self.assertAllClose(laplace.mode().eval(), loc_v)
+
+  def testLaplaceVariance(self):
+    with self.test_session():
+      loc_v = np.array([1.0, 3.0, 2.5])
+      scale_v = np.array([1.0, 4.0, 5.0])
+      laplace = laplace_lib.Laplace(loc=loc_v, scale=scale_v)
+      self.assertEqual(laplace.variance().get_shape(), (3,))
+      if not stats:
+        return
+      expected_variances = stats.laplace.var(loc_v, scale=scale_v)
+      self.assertAllClose(laplace.variance().eval(), expected_variances)
+
+  def testLaplaceStd(self):
+    with self.test_session():
+      loc_v = np.array([1.0, 3.0, 2.5])
+      scale_v = np.array([1.0, 4.0, 5.0])
+      laplace = laplace_lib.Laplace(loc=loc_v, scale=scale_v)
+      self.assertEqual(laplace.stddev().get_shape(), (3,))
+      if not stats:
+        return
+      expected_stddev = stats.laplace.std(loc_v, scale=scale_v)
+      self.assertAllClose(laplace.stddev().eval(), expected_stddev)
+
+  def testLaplaceEntropy(self):
+    with self.test_session():
+      loc_v = np.array([1.0, 3.0, 2.5])
+      scale_v = np.array([1.0, 4.0, 5.0])
+      laplace = laplace_lib.Laplace(loc=loc_v, scale=scale_v)
+      self.assertEqual(laplace.entropy().get_shape(), (3,))
+      if not stats:
+        return
+      expected_entropy = stats.laplace.entropy(loc_v, scale=scale_v)
+      self.assertAllClose(laplace.entropy().eval(), expected_entropy)
+
+  def testLaplaceSample(self):
+    with session.Session():
+      loc_v = 4.0
+      scale_v = 3.0
+      loc = constant_op.constant(loc_v)
+      scale = constant_op.constant(scale_v)
+      n = 100000
+      laplace = laplace_lib.Laplace(loc=loc, scale=scale)
+      samples = laplace.sample(n, seed=137)
+      sample_values = samples.eval()
+      self.assertEqual(samples.get_shape(), (n,))
+      self.assertEqual(sample_values.shape, (n,))
+      if not stats:
+        return
+      self.assertAllClose(
+          sample_values.mean(),
+          stats.laplace.mean(
+              loc_v, scale=scale_v),
+          rtol=0.05,
+          atol=0.)
+      self.assertAllClose(
+          sample_values.var(),
+          stats.laplace.var(loc_v, scale=scale_v),
+          rtol=0.05,
+          atol=0.)
+      self.assertTrue(self._kstest(loc_v, scale_v, sample_values))
+
+  def testLaplaceSampleMultiDimensional(self):
+    with session.Session():
+      loc_v = np.array([np.arange(1, 101, dtype=np.float32)])  # 1 x 100
+      scale_v = np.array([np.arange(1, 11, dtype=np.float32)]).T  # 10 x 1
+      laplace = laplace_lib.Laplace(loc=loc_v, scale=scale_v)
+      n = 10000
+      samples = laplace.sample(n, seed=137)
+      sample_values = samples.eval()
+      self.assertEqual(samples.get_shape(), (n, 10, 100))
+      self.assertEqual(sample_values.shape, (n, 10, 100))
+      zeros = np.zeros_like(loc_v + scale_v)  # 10 x 100
+      loc_bc = loc_v + zeros
+      scale_bc = scale_v + zeros
+      if not stats:
+        return
+      self.assertAllClose(
+          sample_values.mean(axis=0),
+          stats.laplace.mean(
+              loc_bc, scale=scale_bc),
+          rtol=0.35,
+          atol=0.)
+      self.assertAllClose(
+          sample_values.var(axis=0),
+          stats.laplace.var(loc_bc, scale=scale_bc),
+          rtol=0.10,
+          atol=0.)
+      fails = 0
+      trials = 0
+      for ai, a in enumerate(np.reshape(loc_v, [-1])):
+        for bi, b in enumerate(np.reshape(scale_v, [-1])):
+          s = sample_values[:, bi, ai]
+          trials += 1
+          fails += 0 if self._kstest(a, b, s) else 1
+      self.assertLess(fails, trials * 0.03)
+
+  def _kstest(self, loc, scale, samples):
+    # Uses the Kolmogorov-Smirnov test for goodness of fit.
+    if not stats:
+      return True  # If scipy isn't available, return "True" for passing
+    ks, _ = stats.kstest(samples, stats.laplace(loc, scale=scale).cdf)
+    # Return True when the test passes.
+    return ks < 0.02
+
+  def testLaplacePdfOfSampleMultiDims(self):
+    with session.Session() as sess:
+      laplace = laplace_lib.Laplace(loc=[7., 11.], scale=[[5.], [6.]])
+      num = 50000
+      samples = laplace.sample(num, seed=137)
+      pdfs = laplace.prob(samples)
+      sample_vals, pdf_vals = sess.run([samples, pdfs])
+      self.assertEqual(samples.get_shape(), (num, 2, 2))
+      self.assertEqual(pdfs.get_shape(), (num, 2, 2))
+      self._assertIntegral(sample_vals[:, 0, 0], pdf_vals[:, 0, 0], err=0.02)
+      self._assertIntegral(sample_vals[:, 0, 1], pdf_vals[:, 0, 1], err=0.02)
+      self._assertIntegral(sample_vals[:, 1, 0], pdf_vals[:, 1, 0], err=0.02)
+      self._assertIntegral(sample_vals[:, 1, 1], pdf_vals[:, 1, 1], err=0.02)
+      if not stats:
+        return
+      self.assertAllClose(
+          stats.laplace.mean(
+              [[7., 11.], [7., 11.]], scale=np.array([[5., 5.], [6., 6.]])),
+          sample_vals.mean(axis=0),
+          rtol=0.05,
+          atol=0.)
+      self.assertAllClose(
+          stats.laplace.var([[7., 11.], [7., 11.]],
+                            scale=np.array([[5., 5.], [6., 6.]])),
+          sample_vals.var(axis=0),
+          rtol=0.05,
+          atol=0.)
+
+  def _assertIntegral(self, sample_vals, pdf_vals, err=1e-3):
+    s_p = zip(sample_vals, pdf_vals)
+    prev = (0, 0)
+    total = 0
+    for k in sorted(s_p, key=lambda x: x[0]):
+      pair_pdf = (k[1] + prev[1]) / 2
+      total += (k[0] - prev[0]) * pair_pdf
+      prev = k
+    self.assertNear(1., total, err=err)
+
+  def testLaplaceNonPositiveInitializationParamsRaises(self):
+    with self.test_session():
+      loc_v = constant_op.constant(0.0, name="loc")
+      scale_v = constant_op.constant(-1.0, name="scale")
+      laplace = laplace_lib.Laplace(
+          loc=loc_v, scale=scale_v, validate_args=True)
+      with self.assertRaisesOpError("scale"):
+        laplace.mean().eval()
+      loc_v = constant_op.constant(1.0, name="loc")
+      scale_v = constant_op.constant(0.0, name="scale")
+      laplace = laplace_lib.Laplace(
+          loc=loc_v, scale=scale_v, validate_args=True)
+      with self.assertRaisesOpError("scale"):
+        laplace.mean().eval()
+
+  def testLaplaceWithSoftplusScale(self):
+    with self.test_session():
+      loc_v = constant_op.constant([0.0, 1.0], name="loc")
+      scale_v = constant_op.constant([-1.0, 2.0], name="scale")
+      laplace = laplace_lib.LaplaceWithSoftplusScale(loc=loc_v, scale=scale_v)
+      self.assertAllClose(nn_ops.softplus(scale_v).eval(), laplace.scale.eval())
+      self.assertAllClose(loc_v.eval(), laplace.loc.eval())
+
+
+if __name__ == "__main__":
+  test.main()
diff --git a/tensorflow/python/kernel_tests/distributions/multinomial_test.py b/tensorflow/python/kernel_tests/distributions/multinomial_test.py
new file mode 100644
index 0000000000..80caf10391
--- /dev/null
+++ b/tensorflow/python/kernel_tests/distributions/multinomial_test.py
@@ -0,0 +1,343 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+from tensorflow.python.framework import dtypes
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops.distributions import multinomial
+from tensorflow.python.platform import test
+
+
+class MultinomialTest(test.TestCase):
+
+  def setUp(self):
+    self._rng = np.random.RandomState(42)
+
+  def testSimpleShapes(self):
+    with self.test_session():
+      p = [.1, .3, .6]
+      dist = multinomial.Multinomial(total_count=1., probs=p)
+      self.assertEqual(3, dist.event_shape_tensor().eval())
+      self.assertAllEqual([], dist.batch_shape_tensor().eval())
+      self.assertEqual(tensor_shape.TensorShape([3]), dist.event_shape)
+      self.assertEqual(tensor_shape.TensorShape([]), dist.batch_shape)
+
+  def testComplexShapes(self):
+    with self.test_session():
+      p = 0.5 * np.ones([3, 2, 2], dtype=np.float32)
+      n = [[3., 2], [4, 5], [6, 7]]
+      dist = multinomial.Multinomial(total_count=n, probs=p)
+      self.assertEqual(2, dist.event_shape_tensor().eval())
+      self.assertAllEqual([3, 2], dist.batch_shape_tensor().eval())
+      self.assertEqual(tensor_shape.TensorShape([2]), dist.event_shape)
+      self.assertEqual(tensor_shape.TensorShape([3, 2]), dist.batch_shape)
+
+  def testN(self):
+    p = [[0.1, 0.2, 0.7], [0.2, 0.3, 0.5]]
+    n = [[3.], [4]]
+    with self.test_session():
+      dist = multinomial.Multinomial(total_count=n, probs=p)
+      self.assertEqual((2, 1), dist.total_count.get_shape())
+      self.assertAllClose(n, dist.total_count.eval())
+
+  def testP(self):
+    p = [[0.1, 0.2, 0.7]]
+    with self.test_session():
+      dist = multinomial.Multinomial(total_count=3., probs=p)
+      self.assertEqual((1, 3), dist.probs.get_shape())
+      self.assertEqual((1, 3), dist.logits.get_shape())
+      self.assertAllClose(p, dist.probs.eval())
+
+  def testLogits(self):
+    p = np.array([[0.1, 0.2, 0.7]], dtype=np.float32)
+    logits = np.log(p) - 50.
+    with self.test_session():
+      multinom = multinomial.Multinomial(total_count=3., logits=logits)
+      self.assertEqual((1, 3), multinom.probs.get_shape())
+      self.assertEqual((1, 3), multinom.logits.get_shape())
+      self.assertAllClose(p, multinom.probs.eval())
+      self.assertAllClose(logits, multinom.logits.eval())
+
+  def testPmfandCountsAgree(self):
+    p = [[0.1, 0.2, 0.7]]
+    n = [[5.]]
+    with self.test_session():
+      dist = multinomial.Multinomial(total_count=n, probs=p, validate_args=True)
+      dist.prob([2., 3, 0]).eval()
+      dist.prob([3., 0, 2]).eval()
+      with self.assertRaisesOpError("must be non-negative"):
+        dist.prob([-1., 4, 2]).eval()
+      with self.assertRaisesOpError("counts must sum to `self.total_count`"):
+        dist.prob([3., 3, 0]).eval()
+
+  def testPmfNonIntegerCounts(self):
+    p = [[0.1, 0.2, 0.7]]
+    n = [[5.]]
+    with self.test_session():
+      # No errors with integer n.
+      multinom = multinomial.Multinomial(
+          total_count=n, probs=p, validate_args=True)
+      multinom.prob([2., 1, 2]).eval()
+      multinom.prob([3., 0, 2]).eval()
+      # Counts don't sum to n.
+      with self.assertRaisesOpError("counts must sum to `self.total_count`"):
+        multinom.prob([2., 3, 2]).eval()
+      # Counts are non-integers.
+      x = array_ops.placeholder(dtypes.float32)
+      with self.assertRaisesOpError(
+          "cannot contain fractional components."):
+        multinom.prob(x).eval(feed_dict={x: [1.0, 2.5, 1.5]})
+
+      multinom = multinomial.Multinomial(
+          total_count=n, probs=p, validate_args=False)
+      multinom.prob([1., 2., 2.]).eval()
+      # Non-integer arguments work.
+      multinom.prob([1.0, 2.5, 1.5]).eval()
+
+  def testPmfBothZeroBatches(self):
+    with self.test_session():
+      # Both zero-batches.  No broadcast
+      p = [0.5, 0.5]
+      counts = [1., 0]
+      pmf = multinomial.Multinomial(total_count=1., probs=p).prob(counts)
+      self.assertAllClose(0.5, pmf.eval())
+      self.assertEqual((), pmf.get_shape())
+
+  def testPmfBothZeroBatchesNontrivialN(self):
+    with self.test_session():
+      # Both zero-batches.  No broadcast
+      p = [0.1, 0.9]
+      counts = [3., 2]
+      dist = multinomial.Multinomial(total_count=5., probs=p)
+      pmf = dist.prob(counts)
+      # 5 choose 3 = 5 choose 2 = 10. 10 * (.9)^2 * (.1)^3 = 81/10000.
+      self.assertAllClose(81. / 10000, pmf.eval())
+      self.assertEqual((), pmf.get_shape())
+
+  def testPmfPStretchedInBroadcastWhenSameRank(self):
+    with self.test_session():
+      p = [[0.1, 0.9]]
+      counts = [[1., 0], [0, 1]]
+      pmf = multinomial.Multinomial(total_count=1., probs=p).prob(counts)
+      self.assertAllClose([0.1, 0.9], pmf.eval())
+      self.assertEqual((2), pmf.get_shape())
+
+  def testPmfPStretchedInBroadcastWhenLowerRank(self):
+    with self.test_session():
+      p = [0.1, 0.9]
+      counts = [[1., 0], [0, 1]]
+      pmf = multinomial.Multinomial(total_count=1., probs=p).prob(counts)
+      self.assertAllClose([0.1, 0.9], pmf.eval())
+      self.assertEqual((2), pmf.get_shape())
+
+  def testPmfCountsStretchedInBroadcastWhenSameRank(self):
+    with self.test_session():
+      p = [[0.1, 0.9], [0.7, 0.3]]
+      counts = [[1., 0]]
+      pmf = multinomial.Multinomial(total_count=1., probs=p).prob(counts)
+      self.assertAllClose(pmf.eval(), [0.1, 0.7])
+      self.assertEqual((2), pmf.get_shape())
+
+  def testPmfCountsStretchedInBroadcastWhenLowerRank(self):
+    with self.test_session():
+      p = [[0.1, 0.9], [0.7, 0.3]]
+      counts = [1., 0]
+      pmf = multinomial.Multinomial(total_count=1., probs=p).prob(counts)
+      self.assertAllClose(pmf.eval(), [0.1, 0.7])
+      self.assertEqual(pmf.get_shape(), (2))
+
+  def testPmfShapeCountsStretchedN(self):
+    with self.test_session():
+      # [2, 2, 2]
+      p = [[[0.1, 0.9], [0.1, 0.9]], [[0.7, 0.3], [0.7, 0.3]]]
+      # [2, 2]
+      n = [[3., 3], [3, 3]]
+      # [2]
+      counts = [2., 1]
+      pmf = multinomial.Multinomial(total_count=n, probs=p).prob(counts)
+      pmf.eval()
+      self.assertEqual(pmf.get_shape(), (2, 2))
+
+  def testPmfShapeCountsPStretchedN(self):
+    with self.test_session():
+      p = [0.1, 0.9]
+      counts = [3., 2]
+      n = np.full([4, 3], 5., dtype=np.float32)
+      pmf = multinomial.Multinomial(total_count=n, probs=p).prob(counts)
+      pmf.eval()
+      self.assertEqual((4, 3), pmf.get_shape())
+
+  def testMultinomialMean(self):
+    with self.test_session():
+      n = 5.
+      p = [0.1, 0.2, 0.7]
+      dist = multinomial.Multinomial(total_count=n, probs=p)
+      expected_means = 5 * np.array(p, dtype=np.float32)
+      self.assertEqual((3,), dist.mean().get_shape())
+      self.assertAllClose(expected_means, dist.mean().eval())
+
+  def testMultinomialCovariance(self):
+    with self.test_session():
+      n = 5.
+      p = [0.1, 0.2, 0.7]
+      dist = multinomial.Multinomial(total_count=n, probs=p)
+      expected_covariances = [[9. / 20, -1 / 10, -7 / 20],
+                              [-1 / 10, 4 / 5, -7 / 10],
+                              [-7 / 20, -7 / 10, 21 / 20]]
+      self.assertEqual((3, 3), dist.covariance().get_shape())
+      self.assertAllClose(expected_covariances, dist.covariance().eval())
+
+  def testMultinomialCovarianceBatch(self):
+    with self.test_session():
+      # Shape [2]
+      n = [5.] * 2
+      # Shape [4, 1, 2]
+      p = [[[0.1, 0.9]], [[0.1, 0.9]]] * 2
+      dist = multinomial.Multinomial(total_count=n, probs=p)
+      # Shape [2, 2]
+      inner_var = [[9. / 20, -9 / 20], [-9 / 20, 9 / 20]]
+      # Shape [4, 2, 2, 2]
+      expected_covariances = [[inner_var, inner_var]] * 4
+      self.assertEqual((4, 2, 2, 2), dist.covariance().get_shape())
+      self.assertAllClose(expected_covariances, dist.covariance().eval())
+
+  def testCovarianceMultidimensional(self):
+    # Shape [3, 5, 4]
+    p = np.random.dirichlet([.25, .25, .25, .25], [3, 5]).astype(np.float32)
+    # Shape [6, 3, 3]
+    p2 = np.random.dirichlet([.3, .3, .4], [6, 3]).astype(np.float32)
+
+    ns = np.random.randint(low=1, high=11, size=[3, 5]).astype(np.float32)
+    ns2 = np.random.randint(low=1, high=11, size=[6, 1]).astype(np.float32)
+
+    with self.test_session():
+      dist = multinomial.Multinomial(ns, p)
+      dist2 = multinomial.Multinomial(ns2, p2)
+
+      covariance = dist.covariance()
+      covariance2 = dist2.covariance()
+      self.assertEqual((3, 5, 4, 4), covariance.get_shape())
+      self.assertEqual((6, 3, 3, 3), covariance2.get_shape())
+
+  def testCovarianceFromSampling(self):
+    # We will test mean, cov, var, stddev on a DirichletMultinomial constructed
+    # via broadcast between alpha, n.
+    theta = np.array([[1., 2, 3],
+                      [2.5, 4, 0.01]], dtype=np.float32)
+    theta /= np.sum(theta, 1)[..., array_ops.newaxis]
+    # Ideally we'd be able to test broadcasting but, the multinomial sampler
+    # doesn't support different total counts.
+    n = np.float32(5)
+    with self.test_session() as sess:
+      # batch_shape=[2], event_shape=[3]
+      dist = multinomial.Multinomial(n, theta)
+      x = dist.sample(int(250e3), seed=1)
+      sample_mean = math_ops.reduce_mean(x, 0)
+      x_centered = x - sample_mean[array_ops.newaxis, ...]
+      sample_cov = math_ops.reduce_mean(math_ops.matmul(
+          x_centered[..., array_ops.newaxis],
+          x_centered[..., array_ops.newaxis, :]), 0)
+      sample_var = array_ops.matrix_diag_part(sample_cov)
+      sample_stddev = math_ops.sqrt(sample_var)
+      [
+          sample_mean_,
+          sample_cov_,
+          sample_var_,
+          sample_stddev_,
+          analytic_mean,
+          analytic_cov,
+          analytic_var,
+          analytic_stddev,
+      ] = sess.run([
+          sample_mean,
+          sample_cov,
+          sample_var,
+          sample_stddev,
+          dist.mean(),
+          dist.covariance(),
+          dist.variance(),
+          dist.stddev(),
+      ])
+      self.assertAllClose(sample_mean_, analytic_mean, atol=0., rtol=0.01)
+      self.assertAllClose(sample_cov_, analytic_cov, atol=0., rtol=0.01)
+      self.assertAllClose(sample_var_, analytic_var, atol=0., rtol=0.01)
+      self.assertAllClose(sample_stddev_, analytic_stddev, atol=0., rtol=0.01)
+
+  def testSampleUnbiasedNonScalarBatch(self):
+    with self.test_session() as sess:
+      dist = multinomial.Multinomial(
+          total_count=5.,
+          logits=math_ops.log(2. * self._rng.rand(4, 3, 2).astype(np.float32)))
+      n = int(3e3)
+      x = dist.sample(n, seed=0)
+      sample_mean = math_ops.reduce_mean(x, 0)
+      # Cyclically rotate event dims left.
+      x_centered = array_ops.transpose(x - sample_mean, [1, 2, 3, 0])
+      sample_covariance = math_ops.matmul(
+          x_centered, x_centered, adjoint_b=True) / n
+      [
+          sample_mean_,
+          sample_covariance_,
+          actual_mean_,
+          actual_covariance_,
+      ] = sess.run([
+          sample_mean,
+          sample_covariance,
+          dist.mean(),
+          dist.covariance(),
+      ])
+      self.assertAllEqual([4, 3, 2], sample_mean.get_shape())
+      self.assertAllClose(actual_mean_, sample_mean_, atol=0., rtol=0.07)
+      self.assertAllEqual([4, 3, 2, 2], sample_covariance.get_shape())
+      self.assertAllClose(
+          actual_covariance_, sample_covariance_, atol=0., rtol=0.10)
+
+  def testSampleUnbiasedScalarBatch(self):
+    with self.test_session() as sess:
+      dist = multinomial.Multinomial(
+          total_count=5.,
+          logits=math_ops.log(2. * self._rng.rand(4).astype(np.float32)))
+      n = int(5e3)
+      x = dist.sample(n, seed=0)
+      sample_mean = math_ops.reduce_mean(x, 0)
+      x_centered = x - sample_mean  # Already transposed to [n, 2].
+      sample_covariance = math_ops.matmul(
+          x_centered, x_centered, adjoint_a=True) / n
+      [
+          sample_mean_,
+          sample_covariance_,
+          actual_mean_,
+          actual_covariance_,
+      ] = sess.run([
+          sample_mean,
+          sample_covariance,
+          dist.mean(),
+          dist.covariance(),
+      ])
+      self.assertAllEqual([4], sample_mean.get_shape())
+      self.assertAllClose(actual_mean_, sample_mean_, atol=0., rtol=0.07)
+      self.assertAllEqual([4, 4], sample_covariance.get_shape())
+      self.assertAllClose(
+          actual_covariance_, sample_covariance_, atol=0., rtol=0.10)
+
+
+if __name__ == "__main__":
+  test.main()
diff --git a/tensorflow/python/kernel_tests/distributions/student_t_test.py b/tensorflow/python/kernel_tests/distributions/student_t_test.py
new file mode 100644
index 0000000000..f1150de58e
--- /dev/null
+++ b/tensorflow/python/kernel_tests/distributions/student_t_test.py
@@ -0,0 +1,516 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for Student t distribution."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import importlib
+import math
+
+import numpy as np
+
+from tensorflow.python.framework import constant_op
+from tensorflow.python.framework import random_seed
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import nn_ops
+from tensorflow.python.ops.distributions import student_t
+from tensorflow.python.platform import test
+from tensorflow.python.platform import tf_logging
+
+
+def try_import(name):  # pylint: disable=invalid-name
+  module = None
+  try:
+    module = importlib.import_module(name)
+  except ImportError as e:
+    tf_logging.warning("Could not import %s: %s" % (name, str(e)))
+  return module
+
+
+stats = try_import("scipy.stats")
+
+
+class StudentTTest(test.TestCase):
+
+  def testStudentPDFAndLogPDF(self):
+    with self.test_session():
+      batch_size = 6
+      df = constant_op.constant([3.] * batch_size)
+      mu = constant_op.constant([7.] * batch_size)
+      sigma = constant_op.constant([8.] * batch_size)
+      df_v = 3.
+      mu_v = 7.
+      sigma_v = 8.
+      t = np.array([-2.5, 2.5, 8., 0., -1., 2.], dtype=np.float32)
+      student = student_t.StudentT(df, loc=mu, scale=-sigma)
+
+      log_pdf = student.log_prob(t)
+      self.assertEquals(log_pdf.get_shape(), (6,))
+      log_pdf_values = log_pdf.eval()
+      pdf = student.prob(t)
+      self.assertEquals(pdf.get_shape(), (6,))
+      pdf_values = pdf.eval()
+
+      if not stats:
+        return
+
+      expected_log_pdf = stats.t.logpdf(t, df_v, loc=mu_v, scale=sigma_v)
+      expected_pdf = stats.t.pdf(t, df_v, loc=mu_v, scale=sigma_v)
+      self.assertAllClose(expected_log_pdf, log_pdf_values)
+      self.assertAllClose(np.log(expected_pdf), log_pdf_values)
+      self.assertAllClose(expected_pdf, pdf_values)
+      self.assertAllClose(np.exp(expected_log_pdf), pdf_values)
+
+  def testStudentLogPDFMultidimensional(self):
+    with self.test_session():
+      batch_size = 6
+      df = constant_op.constant([[1.5, 7.2]] * batch_size)
+      mu = constant_op.constant([[3., -3.]] * batch_size)
+      sigma = constant_op.constant([[-math.sqrt(10.), math.sqrt(15.)]] *
+                                   batch_size)
+      df_v = np.array([1.5, 7.2])
+      mu_v = np.array([3., -3.])
+      sigma_v = np.array([np.sqrt(10.), np.sqrt(15.)])
+      t = np.array([[-2.5, 2.5, 4., 0., -1., 2.]], dtype=np.float32).T
+      student = student_t.StudentT(df, loc=mu, scale=sigma)
+      log_pdf = student.log_prob(t)
+      log_pdf_values = log_pdf.eval()
+      self.assertEqual(log_pdf.get_shape(), (6, 2))
+      pdf = student.prob(t)
+      pdf_values = pdf.eval()
+      self.assertEqual(pdf.get_shape(), (6, 2))
+
+      if not stats:
+        return
+      expected_log_pdf = stats.t.logpdf(t, df_v, loc=mu_v, scale=sigma_v)
+      expected_pdf = stats.t.pdf(t, df_v, loc=mu_v, scale=sigma_v)
+      self.assertAllClose(expected_log_pdf, log_pdf_values)
+      self.assertAllClose(np.log(expected_pdf), log_pdf_values)
+      self.assertAllClose(expected_pdf, pdf_values)
+      self.assertAllClose(np.exp(expected_log_pdf), pdf_values)
+
+  def testStudentCDFAndLogCDF(self):
+    with self.test_session():
+      batch_size = 6
+      df = constant_op.constant([3.] * batch_size)
+      mu = constant_op.constant([7.] * batch_size)
+      sigma = constant_op.constant([-8.] * batch_size)
+      df_v = 3.
+      mu_v = 7.
+      sigma_v = 8.
+      t = np.array([-2.5, 2.5, 8., 0., -1., 2.], dtype=np.float32)
+      student = student_t.StudentT(df, loc=mu, scale=sigma)
+
+      log_cdf = student.log_cdf(t)
+      self.assertEquals(log_cdf.get_shape(), (6,))
+      log_cdf_values = log_cdf.eval()
+      cdf = student.cdf(t)
+      self.assertEquals(cdf.get_shape(), (6,))
+      cdf_values = cdf.eval()
+
+      if not stats:
+        return
+      expected_log_cdf = stats.t.logcdf(t, df_v, loc=mu_v, scale=sigma_v)
+      expected_cdf = stats.t.cdf(t, df_v, loc=mu_v, scale=sigma_v)
+      self.assertAllClose(expected_log_cdf, log_cdf_values, atol=0., rtol=1e-5)
+      self.assertAllClose(
+          np.log(expected_cdf), log_cdf_values, atol=0., rtol=1e-5)
+      self.assertAllClose(expected_cdf, cdf_values, atol=0., rtol=1e-5)
+      self.assertAllClose(
+          np.exp(expected_log_cdf), cdf_values, atol=0., rtol=1e-5)
+
+  def testStudentEntropy(self):
+    df_v = np.array([[2., 3., 7.]])  # 1x3
+    mu_v = np.array([[1., -1, 0]])  # 1x3
+    sigma_v = np.array([[1., -2., 3.]]).T  # transposed => 3x1
+    with self.test_session():
+      student = student_t.StudentT(df=df_v, loc=mu_v, scale=sigma_v)
+      ent = student.entropy()
+      ent_values = ent.eval()
+
+    # Help scipy broadcast to 3x3
+    ones = np.array([[1, 1, 1]])
+    sigma_bc = np.abs(sigma_v) * ones
+    mu_bc = ones.T * mu_v
+    df_bc = ones.T * df_v
+    if not stats:
+      return
+    expected_entropy = stats.t.entropy(
+        np.reshape(df_bc, [-1]),
+        loc=np.reshape(mu_bc, [-1]),
+        scale=np.reshape(sigma_bc, [-1]))
+    expected_entropy = np.reshape(expected_entropy, df_bc.shape)
+    self.assertAllClose(expected_entropy, ent_values)
+
+  def testStudentSample(self):
+    with self.test_session():
+      df = constant_op.constant(4.)
+      mu = constant_op.constant(3.)
+      sigma = constant_op.constant(-math.sqrt(10.))
+      df_v = 4.
+      mu_v = 3.
+      sigma_v = np.sqrt(10.)
+      n = constant_op.constant(200000)
+      student = student_t.StudentT(df=df, loc=mu, scale=sigma)
+      samples = student.sample(n, seed=123456)
+      sample_values = samples.eval()
+      n_val = 200000
+      self.assertEqual(sample_values.shape, (n_val,))
+      self.assertAllClose(sample_values.mean(), mu_v, rtol=1e-2, atol=0)
+      self.assertAllClose(
+          sample_values.var(),
+          sigma_v**2 * df_v / (df_v - 2),
+          rtol=1e-2,
+          atol=0)
+      self._checkKLApprox(df_v, mu_v, sigma_v, sample_values)
+
+  # Test that sampling with the same seed twice gives the same results.
+  def testStudentSampleMultipleTimes(self):
+    with self.test_session():
+      df = constant_op.constant(4.)
+      mu = constant_op.constant(3.)
+      sigma = constant_op.constant(math.sqrt(10.))
+      n = constant_op.constant(100)
+
+      random_seed.set_random_seed(654321)
+      student = student_t.StudentT(
+          df=df, loc=mu, scale=sigma, name="student_t1")
+      samples1 = student.sample(n, seed=123456).eval()
+
+      random_seed.set_random_seed(654321)
+      student2 = student_t.StudentT(
+          df=df, loc=mu, scale=sigma, name="student_t2")
+      samples2 = student2.sample(n, seed=123456).eval()
+
+      self.assertAllClose(samples1, samples2)
+
+  def testStudentSampleSmallDfNoNan(self):
+    with self.test_session():
+      df_v = [1e-1, 1e-5, 1e-10, 1e-20]
+      df = constant_op.constant(df_v)
+      n = constant_op.constant(200000)
+      student = student_t.StudentT(df=df, loc=1., scale=1.)
+      samples = student.sample(n, seed=123456)
+      sample_values = samples.eval()
+      n_val = 200000
+      self.assertEqual(sample_values.shape, (n_val, 4))
+      self.assertTrue(np.all(np.logical_not(np.isnan(sample_values))))
+
+  def testStudentSampleMultiDimensional(self):
+    with self.test_session():
+      batch_size = 7
+      df = constant_op.constant([[3., 7.]] * batch_size)
+      mu = constant_op.constant([[3., -3.]] * batch_size)
+      sigma = constant_op.constant([[math.sqrt(10.), math.sqrt(15.)]] *
+                                   batch_size)
+      df_v = [3., 7.]
+      mu_v = [3., -3.]
+      sigma_v = [np.sqrt(10.), np.sqrt(15.)]
+      n = constant_op.constant(200000)
+      student = student_t.StudentT(df=df, loc=mu, scale=sigma)
+      samples = student.sample(n, seed=123456)
+      sample_values = samples.eval()
+      self.assertEqual(samples.get_shape(), (200000, batch_size, 2))
+      self.assertAllClose(
+          sample_values[:, 0, 0].mean(), mu_v[0], rtol=1e-2, atol=0)
+      self.assertAllClose(
+          sample_values[:, 0, 0].var(),
+          sigma_v[0]**2 * df_v[0] / (df_v[0] - 2),
+          rtol=1e-1,
+          atol=0)
+      self._checkKLApprox(df_v[0], mu_v[0], sigma_v[0], sample_values[:, 0, 0])
+      self.assertAllClose(
+          sample_values[:, 0, 1].mean(), mu_v[1], rtol=1e-2, atol=0)
+      self.assertAllClose(
+          sample_values[:, 0, 1].var(),
+          sigma_v[1]**2 * df_v[1] / (df_v[1] - 2),
+          rtol=1e-1,
+          atol=0)
+      self._checkKLApprox(df_v[0], mu_v[0], sigma_v[0], sample_values[:, 0, 1])
+
+  def _checkKLApprox(self, df, mu, sigma, samples):
+    n = samples.size
+    np.random.seed(137)
+    if not stats:
+      return
+    sample_scipy = stats.t.rvs(df, loc=mu, scale=sigma, size=n)
+    covg = 0.99
+    r = stats.t.interval(covg, df, loc=mu, scale=sigma)
+    bins = 100
+    hist, _ = np.histogram(samples, bins=bins, range=r)
+    hist_scipy, _ = np.histogram(sample_scipy, bins=bins, range=r)
+    self.assertGreater(hist.sum(), n * (covg - .01))
+    self.assertGreater(hist_scipy.sum(), n * (covg - .01))
+    hist_min1 = hist + 1.  # put at least one item in each bucket
+    hist_norm = hist_min1 / hist_min1.sum()
+    hist_scipy_min1 = hist_scipy + 1.  # put at least one item in each bucket
+    hist_scipy_norm = hist_scipy_min1 / hist_scipy_min1.sum()
+    kl_appx = np.sum(np.log(hist_scipy_norm / hist_norm) * hist_scipy_norm)
+    self.assertLess(kl_appx, 1)
+
+  def testBroadcastingParams(self):
+
+    def _check(student):
+      self.assertEqual(student.mean().get_shape(), (3,))
+      self.assertEqual(student.variance().get_shape(), (3,))
+      self.assertEqual(student.entropy().get_shape(), (3,))
+      self.assertEqual(student.log_prob(2.).get_shape(), (3,))
+      self.assertEqual(student.prob(2.).get_shape(), (3,))
+      self.assertEqual(student.sample(37, seed=123456).get_shape(), (37, 3,))
+
+    _check(student_t.StudentT(df=[2., 3., 4.,], loc=2., scale=1.))
+    _check(student_t.StudentT(df=7., loc=[2., 3., 4.,], scale=1.))
+    _check(student_t.StudentT(df=7., loc=3., scale=[2., 3., 4.,]))
+
+  def testBroadcastingPdfArgs(self):
+
+    def _assert_shape(student, arg, shape):
+      self.assertEqual(student.log_prob(arg).get_shape(), shape)
+      self.assertEqual(student.prob(arg).get_shape(), shape)
+
+    def _check(student):
+      _assert_shape(student, 2., (3,))
+      xs = np.array([2., 3., 4.], dtype=np.float32)
+      _assert_shape(student, xs, (3,))
+      xs = np.array([xs])
+      _assert_shape(student, xs, (1, 3))
+      xs = xs.T
+      _assert_shape(student, xs, (3, 3))
+
+    _check(student_t.StudentT(df=[2., 3., 4.,], loc=2., scale=1.))
+    _check(student_t.StudentT(df=7., loc=[2., 3., 4.,], scale=1.))
+    _check(student_t.StudentT(df=7., loc=3., scale=[2., 3., 4.,]))
+
+    def _check2d(student):
+      _assert_shape(student, 2., (1, 3))
+      xs = np.array([2., 3., 4.], dtype=np.float32)
+      _assert_shape(student, xs, (1, 3))
+      xs = np.array([xs])
+      _assert_shape(student, xs, (1, 3))
+      xs = xs.T
+      _assert_shape(student, xs, (3, 3))
+
+    _check2d(student_t.StudentT(df=[[2., 3., 4.,]], loc=2., scale=1.))
+    _check2d(student_t.StudentT(df=7., loc=[[2., 3., 4.,]], scale=1.))
+    _check2d(student_t.StudentT(df=7., loc=3., scale=[[2., 3., 4.,]]))
+
+    def _check2d_rows(student):
+      _assert_shape(student, 2., (3, 1))
+      xs = np.array([2., 3., 4.], dtype=np.float32)  # (3,)
+      _assert_shape(student, xs, (3, 3))
+      xs = np.array([xs])  # (1,3)
+      _assert_shape(student, xs, (3, 3))
+      xs = xs.T  # (3,1)
+      _assert_shape(student, xs, (3, 1))
+
+    _check2d_rows(student_t.StudentT(df=[[2.], [3.], [4.]], loc=2., scale=1.))
+    _check2d_rows(student_t.StudentT(df=7., loc=[[2.], [3.], [4.]], scale=1.))
+    _check2d_rows(student_t.StudentT(df=7., loc=3., scale=[[2.], [3.], [4.]]))
+
+  def testMeanAllowNanStatsIsFalseWorksWhenAllBatchMembersAreDefined(self):
+    with self.test_session():
+      mu = [1., 3.3, 4.4]
+      student = student_t.StudentT(df=[3., 5., 7.], loc=mu, scale=[3., 2., 1.])
+      mean = student.mean().eval()
+      self.assertAllClose([1., 3.3, 4.4], mean)
+
+  def testMeanAllowNanStatsIsFalseRaisesWhenBatchMemberIsUndefined(self):
+    with self.test_session():
+      mu = [1., 3.3, 4.4]
+      student = student_t.StudentT(
+          df=[0.5, 5., 7.], loc=mu, scale=[3., 2., 1.],
+          allow_nan_stats=False)
+      with self.assertRaisesOpError("x < y"):
+        student.mean().eval()
+
+  def testMeanAllowNanStatsIsTrueReturnsNaNForUndefinedBatchMembers(self):
+    with self.test_session():
+      mu = [-2, 0., 1., 3.3, 4.4]
+      sigma = [5., 4., 3., 2., 1.]
+      student = student_t.StudentT(
+          df=[0.5, 1., 3., 5., 7.], loc=mu, scale=sigma,
+          allow_nan_stats=True)
+      mean = student.mean().eval()
+      self.assertAllClose([np.nan, np.nan, 1., 3.3, 4.4], mean)
+
+  def testVarianceAllowNanStatsTrueReturnsNaNforUndefinedBatchMembers(self):
+    with self.test_session():
+      # df = 0.5 ==> undefined mean ==> undefined variance.
+      # df = 1.5 ==> infinite variance.
+      df = [0.5, 1.5, 3., 5., 7.]
+      mu = [-2, 0., 1., 3.3, 4.4]
+      sigma = [5., 4., 3., 2., 1.]
+      student = student_t.StudentT(
+          df=df, loc=mu, scale=sigma, allow_nan_stats=True)
+      var = student.variance().eval()
+      ## scipy uses inf for variance when the mean is undefined.  When mean is
+      # undefined we say variance is undefined as well.  So test the first
+      # member of var, making sure it is NaN, then replace with inf and compare
+      # to scipy.
+      self.assertTrue(np.isnan(var[0]))
+      var[0] = np.inf
+
+      if not stats:
+        return
+      expected_var = [
+          stats.t.var(d, loc=m, scale=s) for (d, m, s) in zip(df, mu, sigma)
+      ]
+      self.assertAllClose(expected_var, var)
+
+  def testVarianceAllowNanStatsFalseGivesCorrectValueForDefinedBatchMembers(
+      self):
+    with self.test_session():
+      # df = 1.5 ==> infinite variance.
+      df = [1.5, 3., 5., 7.]
+      mu = [0., 1., 3.3, 4.4]
+      sigma = [4., 3., 2., 1.]
+      student = student_t.StudentT(df=df, loc=mu, scale=sigma)
+      var = student.variance().eval()
+
+      if not stats:
+        return
+      expected_var = [
+          stats.t.var(d, loc=m, scale=s) for (d, m, s) in zip(df, mu, sigma)
+      ]
+      self.assertAllClose(expected_var, var)
+
+  def testVarianceAllowNanStatsFalseRaisesForUndefinedBatchMembers(self):
+    with self.test_session():
+      # df <= 1 ==> variance not defined
+      student = student_t.StudentT(
+          df=1., loc=0., scale=1., allow_nan_stats=False)
+      with self.assertRaisesOpError("x < y"):
+        student.variance().eval()
+
+    with self.test_session():
+      # df <= 1 ==> variance not defined
+      student = student_t.StudentT(
+          df=0.5, loc=0., scale=1., allow_nan_stats=False)
+      with self.assertRaisesOpError("x < y"):
+        student.variance().eval()
+
+  def testStd(self):
+    with self.test_session():
+      # Defined for all batch members.
+      df = [3.5, 5., 3., 5., 7.]
+      mu = [-2.2]
+      sigma = [5., 4., 3., 2., 1.]
+      student = student_t.StudentT(df=df, loc=mu, scale=sigma)
+      # Test broadcast of mu across shape of df/sigma
+      stddev = student.stddev().eval()
+      mu *= len(df)
+
+      if not stats:
+        return
+      expected_stddev = [
+          stats.t.std(d, loc=m, scale=s) for (d, m, s) in zip(df, mu, sigma)
+      ]
+      self.assertAllClose(expected_stddev, stddev)
+
+  def testMode(self):
+    with self.test_session():
+      df = [0.5, 1., 3]
+      mu = [-1, 0., 1]
+      sigma = [5., 4., 3.]
+      student = student_t.StudentT(df=df, loc=mu, scale=sigma)
+      # Test broadcast of mu across shape of df/sigma
+      mode = student.mode().eval()
+      self.assertAllClose([-1., 0, 1], mode)
+
+  def testPdfOfSample(self):
+    with self.test_session() as sess:
+      student = student_t.StudentT(df=3., loc=np.pi, scale=1.)
+      num = 20000
+      samples = student.sample(num, seed=123456)
+      pdfs = student.prob(samples)
+      mean = student.mean()
+      mean_pdf = student.prob(student.mean())
+      sample_vals, pdf_vals, mean_val, mean_pdf_val = sess.run(
+          [samples, pdfs, student.mean(), mean_pdf])
+      self.assertEqual(samples.get_shape(), (num,))
+      self.assertEqual(pdfs.get_shape(), (num,))
+      self.assertEqual(mean.get_shape(), ())
+      self.assertNear(np.pi, np.mean(sample_vals), err=0.02)
+      self.assertNear(np.pi, mean_val, err=1e-6)
+      # Verify integral over sample*pdf ~= 1.
+      self._assertIntegral(sample_vals, pdf_vals, err=2e-3)
+      if not stats:
+        return
+      self.assertNear(stats.t.pdf(np.pi, 3., loc=np.pi), mean_pdf_val, err=1e-6)
+
+  def testPdfOfSampleMultiDims(self):
+    with self.test_session() as sess:
+      student = student_t.StudentT(df=[7., 11.], loc=[[5.], [6.]], scale=3.)
+      self.assertAllEqual([], student.event_shape)
+      self.assertAllEqual([], student.event_shape_tensor().eval())
+      self.assertAllEqual([2, 2], student.batch_shape)
+      self.assertAllEqual([2, 2], student.batch_shape_tensor().eval())
+      num = 50000
+      samples = student.sample(num, seed=123456)
+      pdfs = student.prob(samples)
+      sample_vals, pdf_vals = sess.run([samples, pdfs])
+      self.assertEqual(samples.get_shape(), (num, 2, 2))
+      self.assertEqual(pdfs.get_shape(), (num, 2, 2))
+      self.assertNear(5., np.mean(sample_vals[:, 0, :]), err=.03)
+      self.assertNear(6., np.mean(sample_vals[:, 1, :]), err=.03)
+      self._assertIntegral(sample_vals[:, 0, 0], pdf_vals[:, 0, 0], err=0.02)
+      self._assertIntegral(sample_vals[:, 0, 1], pdf_vals[:, 0, 1], err=0.02)
+      self._assertIntegral(sample_vals[:, 1, 0], pdf_vals[:, 1, 0], err=0.02)
+      self._assertIntegral(sample_vals[:, 1, 1], pdf_vals[:, 1, 1], err=0.02)
+      if not stats:
+        return
+      self.assertNear(
+          stats.t.var(7., loc=0., scale=3.),  # loc d.n. effect var
+          np.var(sample_vals[:, :, 0]),
+          err=.4)
+      self.assertNear(
+          stats.t.var(11., loc=0., scale=3.),  # loc d.n. effect var
+          np.var(sample_vals[:, :, 1]),
+          err=.4)
+
+  def _assertIntegral(self, sample_vals, pdf_vals, err=1.5e-3):
+    s_p = zip(sample_vals, pdf_vals)
+    prev = (sample_vals.min() - 1000, 0)
+    total = 0
+    for k in sorted(s_p, key=lambda x: x[0]):
+      pair_pdf = (k[1] + prev[1]) / 2
+      total += (k[0] - prev[0]) * pair_pdf
+      prev = k
+    self.assertNear(1., total, err=err)
+
+  def testNegativeDofFails(self):
+    with self.test_session():
+      student = student_t.StudentT(df=[2, -5.], loc=0., scale=1.,
+                                   validate_args=True, name="S")
+      with self.assertRaisesOpError(r"Condition x > 0 did not hold"):
+        student.mean().eval()
+
+  def testStudentTWithAbsDfSoftplusScale(self):
+    with self.test_session():
+      df = constant_op.constant([-3.2, -4.6])
+      mu = constant_op.constant([-4.2, 3.4])
+      sigma = constant_op.constant([-6.4, -8.8])
+      student = student_t.StudentTWithAbsDfSoftplusScale(
+          df=df, loc=mu, scale=sigma)
+      self.assertAllClose(
+          math_ops.floor(math_ops.abs(df)).eval(), student.df.eval())
+      self.assertAllClose(mu.eval(), student.loc.eval())
+      self.assertAllClose(nn_ops.softplus(sigma).eval(), student.scale.eval())
+
+
+if __name__ == "__main__":
+  test.main()
diff --git a/tensorflow/python/kernel_tests/distributions/uniform_test.py b/tensorflow/python/kernel_tests/distributions/uniform_test.py
new file mode 100644
index 0000000000..df99a0ed25
--- /dev/null
+++ b/tensorflow/python/kernel_tests/distributions/uniform_test.py
@@ -0,0 +1,286 @@
+# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for Uniform distribution."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import importlib
+
+import numpy as np
+
+from tensorflow.python.framework import constant_op
+from tensorflow.python.framework import errors
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops.distributions import uniform as uniform_lib
+from tensorflow.python.platform import test
+from tensorflow.python.platform import tf_logging
+
+
+def try_import(name):  # pylint: disable=invalid-name
+  module = None
+  try:
+    module = importlib.import_module(name)
+  except ImportError as e:
+    tf_logging.warning("Could not import %s: %s" % (name, str(e)))
+  return module
+
+
+stats = try_import("scipy.stats")
+
+
+class UniformTest(test.TestCase):
+
+  def testUniformRange(self):
+    with self.test_session():
+      a = 3.0
+      b = 10.0
+      uniform = uniform_lib.Uniform(low=a, high=b)
+      self.assertAllClose(a, uniform.low.eval())
+      self.assertAllClose(b, uniform.high.eval())
+      self.assertAllClose(b - a, uniform.range().eval())
+
+  def testUniformPDF(self):
+    with self.test_session():
+      a = constant_op.constant([-3.0] * 5 + [15.0])
+      b = constant_op.constant([11.0] * 5 + [20.0])
+      uniform = uniform_lib.Uniform(low=a, high=b)
+
+      a_v = -3.0
+      b_v = 11.0
+      x = np.array([-10.5, 4.0, 0.0, 10.99, 11.3, 17.0], dtype=np.float32)
+
+      def _expected_pdf():
+        pdf = np.zeros_like(x) + 1.0 / (b_v - a_v)
+        pdf[x > b_v] = 0.0
+        pdf[x < a_v] = 0.0
+        pdf[5] = 1.0 / (20.0 - 15.0)
+        return pdf
+
+      expected_pdf = _expected_pdf()
+
+      pdf = uniform.prob(x)
+      self.assertAllClose(expected_pdf, pdf.eval())
+
+      log_pdf = uniform.log_prob(x)
+      self.assertAllClose(np.log(expected_pdf), log_pdf.eval())
+
+  def testUniformShape(self):
+    with self.test_session():
+      a = constant_op.constant([-3.0] * 5)
+      b = constant_op.constant(11.0)
+      uniform = uniform_lib.Uniform(low=a, high=b)
+
+      self.assertEqual(uniform.batch_shape_tensor().eval(), (5,))
+      self.assertEqual(uniform.batch_shape, tensor_shape.TensorShape([5]))
+      self.assertAllEqual(uniform.event_shape_tensor().eval(), [])
+      self.assertEqual(uniform.event_shape, tensor_shape.TensorShape([]))
+
+  def testUniformPDFWithScalarEndpoint(self):
+    with self.test_session():
+      a = constant_op.constant([0.0, 5.0])
+      b = constant_op.constant(10.0)
+      uniform = uniform_lib.Uniform(low=a, high=b)
+
+      x = np.array([0.0, 8.0], dtype=np.float32)
+      expected_pdf = np.array([1.0 / (10.0 - 0.0), 1.0 / (10.0 - 5.0)])
+
+      pdf = uniform.prob(x)
+      self.assertAllClose(expected_pdf, pdf.eval())
+
+  def testUniformCDF(self):
+    with self.test_session():
+      batch_size = 6
+      a = constant_op.constant([1.0] * batch_size)
+      b = constant_op.constant([11.0] * batch_size)
+      a_v = 1.0
+      b_v = 11.0
+      x = np.array([-2.5, 2.5, 4.0, 0.0, 10.99, 12.0], dtype=np.float32)
+
+      uniform = uniform_lib.Uniform(low=a, high=b)
+
+      def _expected_cdf():
+        cdf = (x - a_v) / (b_v - a_v)
+        cdf[x >= b_v] = 1
+        cdf[x < a_v] = 0
+        return cdf
+
+      cdf = uniform.cdf(x)
+      self.assertAllClose(_expected_cdf(), cdf.eval())
+
+      log_cdf = uniform.log_cdf(x)
+      self.assertAllClose(np.log(_expected_cdf()), log_cdf.eval())
+
+  def testUniformEntropy(self):
+    with self.test_session():
+      a_v = np.array([1.0, 1.0, 1.0])
+      b_v = np.array([[1.5, 2.0, 3.0]])
+      uniform = uniform_lib.Uniform(low=a_v, high=b_v)
+
+      expected_entropy = np.log(b_v - a_v)
+      self.assertAllClose(expected_entropy, uniform.entropy().eval())
+
+  def testUniformAssertMaxGtMin(self):
+    with self.test_session():
+      a_v = np.array([1.0, 1.0, 1.0], dtype=np.float32)
+      b_v = np.array([1.0, 2.0, 3.0], dtype=np.float32)
+      uniform = uniform_lib.Uniform(low=a_v, high=b_v, validate_args=True)
+
+      with self.assertRaisesWithPredicateMatch(errors.InvalidArgumentError,
+                                               "x < y"):
+        uniform.low.eval()
+
+  def testUniformSample(self):
+    with self.test_session():
+      a = constant_op.constant([3.0, 4.0])
+      b = constant_op.constant(13.0)
+      a1_v = 3.0
+      a2_v = 4.0
+      b_v = 13.0
+      n = constant_op.constant(100000)
+      uniform = uniform_lib.Uniform(low=a, high=b)
+
+      samples = uniform.sample(n, seed=137)
+      sample_values = samples.eval()
+      self.assertEqual(sample_values.shape, (100000, 2))
+      self.assertAllClose(
+          sample_values[::, 0].mean(), (b_v + a1_v) / 2, atol=1e-2)
+      self.assertAllClose(
+          sample_values[::, 1].mean(), (b_v + a2_v) / 2, atol=1e-2)
+      self.assertFalse(
+          np.any(sample_values[::, 0] < a1_v) or np.any(sample_values >= b_v))
+      self.assertFalse(
+          np.any(sample_values[::, 1] < a2_v) or np.any(sample_values >= b_v))
+
+  def _testUniformSampleMultiDimensional(self):
+    # DISABLED: Please enable this test once b/issues/30149644 is resolved.
+    with self.test_session():
+      batch_size = 2
+      a_v = [3.0, 22.0]
+      b_v = [13.0, 35.0]
+      a = constant_op.constant([a_v] * batch_size)
+      b = constant_op.constant([b_v] * batch_size)
+
+      uniform = uniform_lib.Uniform(low=a, high=b)
+
+      n_v = 100000
+      n = constant_op.constant(n_v)
+      samples = uniform.sample(n)
+      self.assertEqual(samples.get_shape(), (n_v, batch_size, 2))
+
+      sample_values = samples.eval()
+
+      self.assertFalse(
+          np.any(sample_values[:, 0, 0] < a_v[0]) or
+          np.any(sample_values[:, 0, 0] >= b_v[0]))
+      self.assertFalse(
+          np.any(sample_values[:, 0, 1] < a_v[1]) or
+          np.any(sample_values[:, 0, 1] >= b_v[1]))
+
+      self.assertAllClose(
+          sample_values[:, 0, 0].mean(), (a_v[0] + b_v[0]) / 2, atol=1e-2)
+      self.assertAllClose(
+          sample_values[:, 0, 1].mean(), (a_v[1] + b_v[1]) / 2, atol=1e-2)
+
+  def testUniformMean(self):
+    with self.test_session():
+      a = 10.0
+      b = 100.0
+      uniform = uniform_lib.Uniform(low=a, high=b)
+      if not stats:
+        return
+      s_uniform = stats.uniform(loc=a, scale=b - a)
+      self.assertAllClose(uniform.mean().eval(), s_uniform.mean())
+
+  def testUniformVariance(self):
+    with self.test_session():
+      a = 10.0
+      b = 100.0
+      uniform = uniform_lib.Uniform(low=a, high=b)
+      if not stats:
+        return
+      s_uniform = stats.uniform(loc=a, scale=b - a)
+      self.assertAllClose(uniform.variance().eval(), s_uniform.var())
+
+  def testUniformStd(self):
+    with self.test_session():
+      a = 10.0
+      b = 100.0
+      uniform = uniform_lib.Uniform(low=a, high=b)
+      if not stats:
+        return
+      s_uniform = stats.uniform(loc=a, scale=b - a)
+      self.assertAllClose(uniform.stddev().eval(), s_uniform.std())
+
+  def testUniformNans(self):
+    with self.test_session():
+      a = 10.0
+      b = [11.0, 100.0]
+      uniform = uniform_lib.Uniform(low=a, high=b)
+
+      no_nans = constant_op.constant(1.0)
+      nans = constant_op.constant(0.0) / constant_op.constant(0.0)
+      self.assertTrue(math_ops.is_nan(nans).eval())
+      with_nans = array_ops.stack([no_nans, nans])
+
+      pdf = uniform.prob(with_nans)
+
+      is_nan = math_ops.is_nan(pdf).eval()
+      self.assertFalse(is_nan[0])
+      self.assertTrue(is_nan[1])
+
+  def testUniformSamplePdf(self):
+    with self.test_session():
+      a = 10.0
+      b = [11.0, 100.0]
+      uniform = uniform_lib.Uniform(a, b)
+      self.assertTrue(
+          math_ops.reduce_all(uniform.prob(uniform.sample(10)) > 0).eval())
+
+  def testUniformBroadcasting(self):
+    with self.test_session():
+      a = 10.0
+      b = [11.0, 20.0]
+      uniform = uniform_lib.Uniform(a, b)
+
+      pdf = uniform.prob([[10.5, 11.5], [9.0, 19.0], [10.5, 21.0]])
+      expected_pdf = np.array([[1.0, 0.1], [0.0, 0.1], [1.0, 0.0]])
+      self.assertAllClose(expected_pdf, pdf.eval())
+
+  def testUniformSampleWithShape(self):
+    with self.test_session():
+      a = 10.0
+      b = [11.0, 20.0]
+      uniform = uniform_lib.Uniform(a, b)
+
+      pdf = uniform.prob(uniform.sample((2, 3)))
+      # pylint: disable=bad-continuation
+      expected_pdf = [
+          [[1.0, 0.1], [1.0, 0.1], [1.0, 0.1]],
+          [[1.0, 0.1], [1.0, 0.1], [1.0, 0.1]],
+      ]
+      # pylint: enable=bad-continuation
+      self.assertAllClose(expected_pdf, pdf.eval())
+
+      pdf = uniform.prob(uniform.sample())
+      expected_pdf = [1.0, 0.1]
+      self.assertAllClose(expected_pdf, pdf.eval())
+
+
+if __name__ == "__main__":
+  test.main()
diff --git a/tensorflow/python/ops/distributions/BUILD b/tensorflow/python/ops/distributions/BUILD
index 90d3f04c72..833239eb5f 100644
--- a/tensorflow/python/ops/distributions/BUILD
+++ b/tensorflow/python/ops/distributions/BUILD
@@ -24,6 +24,7 @@ py_library(
         "//tensorflow/python:nn",
         "//tensorflow/python:nn_ops",
         "//tensorflow/python:platform",
+        "//tensorflow/python:special_math_ops",
         "//tensorflow/python:util",
     ],
 )
diff --git a/tensorflow/python/ops/distributions/bernoulli.py b/tensorflow/python/ops/distributions/bernoulli.py
new file mode 100644
index 0000000000..3281b57e83
--- /dev/null
+++ b/tensorflow/python/ops/distributions/bernoulli.py
@@ -0,0 +1,215 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""The Bernoulli distribution class."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from tensorflow.python.framework import dtypes
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import check_ops
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import nn
+from tensorflow.python.ops import random_ops
+from tensorflow.python.ops.distributions import distribution
+from tensorflow.python.ops.distributions import kullback_leibler
+from tensorflow.python.ops.distributions import util as distribution_util
+
+
+class Bernoulli(distribution.Distribution):
+  """Bernoulli distribution.
+
+  The Bernoulli distribution with `probs` parameter, i.e., the probability of a
+  `1` outcome (vs a `0` outcome).
+  """
+
+  def __init__(self,
+               logits=None,
+               probs=None,
+               dtype=dtypes.int32,
+               validate_args=False,
+               allow_nan_stats=True,
+               name="Bernoulli"):
+    """Construct Bernoulli distributions.
+
+    Args:
+      logits: An N-D `Tensor` representing the log-odds of a `1` event. Each
+        entry in the `Tensor` parametrizes an independent Bernoulli distribution
+        where the probability of an event is sigmoid(logits). Only one of
+        `logits` or `probs` should be passed in.
+      probs: An N-D `Tensor` representing the probability of a `1`
+        event. Each entry in the `Tensor` parameterizes an independent
+        Bernoulli distribution. Only one of `logits` or `probs` should be passed
+        in.
+      dtype: The type of the event samples. Default: `int32`.
+      validate_args: Python `bool`, default `False`. When `True` distribution
+        parameters are checked for validity despite possibly degrading runtime
+        performance. When `False` invalid inputs may silently render incorrect
+        outputs.
+      allow_nan_stats: Python `bool`, default `True`. When `True`,
+        statistics (e.g., mean, mode, variance) use the value "`NaN`" to
+        indicate the result is undefined. When `False`, an exception is raised
+        if one or more of the statistic's batch members are undefined.
+      name: Python `str` name prefixed to Ops created by this class.
+
+    Raises:
+      ValueError: If p and logits are passed, or if neither are passed.
+    """
+    parameters = locals()
+    with ops.name_scope(name):
+      self._logits, self._probs = distribution_util.get_logits_and_probs(
+          logits=logits,
+          probs=probs,
+          validate_args=validate_args,
+          name=name)
+    super(Bernoulli, self).__init__(
+        dtype=dtype,
+        reparameterization_type=distribution.NOT_REPARAMETERIZED,
+        validate_args=validate_args,
+        allow_nan_stats=allow_nan_stats,
+        parameters=parameters,
+        graph_parents=[self._logits, self._probs],
+        name=name)
+
+  @staticmethod
+  def _param_shapes(sample_shape):
+    return {"logits": ops.convert_to_tensor(sample_shape, dtype=dtypes.int32)}
+
+  @property
+  def logits(self):
+    """Log-odds of a `1` outcome (vs `0`)."""
+    return self._logits
+
+  @property
+  def probs(self):
+    """Probability of a `1` outcome (vs `0`)."""
+    return self._probs
+
+  def _batch_shape_tensor(self):
+    return array_ops.shape(self._logits)
+
+  def _batch_shape(self):
+    return self._logits.get_shape()
+
+  def _event_shape_tensor(self):
+    return array_ops.constant([], dtype=dtypes.int32)
+
+  def _event_shape(self):
+    return tensor_shape.scalar()
+
+  def _sample_n(self, n, seed=None):
+    new_shape = array_ops.concat([[n], self.batch_shape_tensor()], 0)
+    uniform = random_ops.random_uniform(
+        new_shape, seed=seed, dtype=self.probs.dtype)
+    sample = math_ops.less(uniform, self.probs)
+    return math_ops.cast(sample, self.dtype)
+
+  def _log_prob(self, event):
+    event = self._maybe_assert_valid_sample(event)
+    # TODO(jaana): The current sigmoid_cross_entropy_with_logits has
+    # inconsistent  behavior for logits = inf/-inf.
+    event = math_ops.cast(event, self.logits.dtype)
+    logits = self.logits
+    # sigmoid_cross_entropy_with_logits doesn't broadcast shape,
+    # so we do this here.
+
+    def _broadcast(logits, event):
+      return (array_ops.ones_like(event) * logits,
+              array_ops.ones_like(logits) * event)
+
+    # First check static shape.
+    if (event.get_shape().is_fully_defined() and
+        logits.get_shape().is_fully_defined()):
+      if event.get_shape() != logits.get_shape():
+        logits, event = _broadcast(logits, event)
+    else:
+      logits, event = control_flow_ops.cond(
+          distribution_util.same_dynamic_shape(logits, event),
+          lambda: (logits, event),
+          lambda: _broadcast(logits, event))
+    return -nn.sigmoid_cross_entropy_with_logits(labels=event, logits=logits)
+
+  def _prob(self, event):
+    return math_ops.exp(self._log_prob(event))
+
+  def _entropy(self):
+    return (-self.logits * (math_ops.sigmoid(self.logits) - 1) +
+            nn.softplus(-self.logits))
+
+  def _mean(self):
+    return array_ops.identity(self.probs)
+
+  def _variance(self):
+    return self._mean() * (1. - self.probs)
+
+  def _mode(self):
+    """Returns `1` if `prob > 0.5` and `0` otherwise."""
+    return math_ops.cast(self.probs > 0.5, self.dtype)
+
+  def _maybe_assert_valid_sample(self, event, check_integer=True):
+    if not self.validate_args:
+      return event
+    event = distribution_util.embed_check_nonnegative_discrete(
+        event, check_integer=check_integer)
+    return control_flow_ops.with_dependencies([
+        check_ops.assert_less_equal(
+            event, array_ops.ones_like(event),
+            message="event is not less than or equal to 1."),
+    ], event)
+
+
+class BernoulliWithSigmoidProbs(Bernoulli):
+  """Bernoulli with `probs = nn.sigmoid(logits)`."""
+
+  def __init__(self,
+               logits=None,
+               dtype=dtypes.int32,
+               validate_args=False,
+               allow_nan_stats=True,
+               name="BernoulliWithSigmoidProbs"):
+    parameters = locals()
+    with ops.name_scope(name):
+      super(BernoulliWithSigmoidProbs, self).__init__(
+          probs=nn.sigmoid(logits, name="sigmoid_probs"),
+          dtype=dtype,
+          validate_args=validate_args,
+          allow_nan_stats=allow_nan_stats,
+          name=name)
+    self._parameters = parameters
+
+
+@kullback_leibler.RegisterKL(Bernoulli, Bernoulli)
+def _kl_bernoulli_bernoulli(a, b, name=None):
+  """Calculate the batched KL divergence KL(a || b) with a and b Bernoulli.
+
+  Args:
+    a: instance of a Bernoulli distribution object.
+    b: instance of a Bernoulli distribution object.
+    name: (optional) Name to use for created operations.
+      default is "kl_bernoulli_bernoulli".
+
+  Returns:
+    Batchwise KL(a || b)
+  """
+  with ops.name_scope(name, "kl_bernoulli_bernoulli",
+                      values=[a.logits, b.logits]):
+    delta_probs0 = nn.softplus(-b.logits) - nn.softplus(-a.logits)
+    delta_probs1 = nn.softplus(b.logits) - nn.softplus(a.logits)
+    return (math_ops.sigmoid(a.logits) * delta_probs0
+            + math_ops.sigmoid(-a.logits) * delta_probs1)
diff --git a/tensorflow/python/ops/distributions/beta.py b/tensorflow/python/ops/distributions/beta.py
new file mode 100644
index 0000000000..2b93478cdf
--- /dev/null
+++ b/tensorflow/python/ops/distributions/beta.py
@@ -0,0 +1,366 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""The Beta distribution class."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+from tensorflow.python.framework import constant_op
+from tensorflow.python.framework import dtypes
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import check_ops
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import nn
+from tensorflow.python.ops import random_ops
+from tensorflow.python.ops.distributions import distribution
+from tensorflow.python.ops.distributions import kullback_leibler
+from tensorflow.python.ops.distributions import util as distribution_util
+
+
+__all__ = [
+    "Beta",
+    "BetaWithSoftplusConcentration",
+]
+
+
+_beta_sample_note = """Note: `x` must have dtype `self.dtype` and be in
+`[0, 1].` It must have a shape compatible with `self.batch_shape()`."""
+
+
+class Beta(distribution.Distribution):
+  """Beta distribution.
+
+  The Beta distribution is defined over the `(0, 1)` interval using parameters
+  `concentration1` (aka "alpha") and `concentration0` (aka "beta").
+
+  #### Mathematical Details
+
+  The probability density function (pdf) is,
+
+  ```none
+  pdf(x; alpha, beta) = x**(alpha - 1) (1 - x)**(beta - 1) / Z
+  Z = Gamma(alpha) Gamma(beta) / Gamma(alpha + beta)
+  ```
+
+  where:
+
+  * `concentration1 = alpha`,
+  * `concentration0 = beta`,
+  * `Z` is the normalization constant, and,
+  * `Gamma` is the [gamma function](
+    https://en.wikipedia.org/wiki/Gamma_function).
+
+  The concentration parameters represent mean total counts of a `1` or a `0`,
+  i.e.,
+
+  ```none
+  concentration1 = alpha = mean * total_concentration
+  concentration0 = beta  = (1. - mean) * total_concentration
+  ```
+
+  where `mean` in `(0, 1)` and `total_concentration` is a positive real number
+  representing a mean `total_count = concentration1 + concentration0`.
+
+  Distribution parameters are automatically broadcast in all functions; see
+  examples for details.
+
+  #### Examples
+
+  ```python
+  # Create a batch of three Beta distributions.
+  alpha = [1, 2, 3]
+  beta = [1, 2, 3]
+  dist = Beta(alpha, beta)
+
+  dist.sample([4, 5])  # Shape [4, 5, 3]
+
+  # `x` has three batch entries, each with two samples.
+  x = [[.1, .4, .5],
+       [.2, .3, .5]]
+  # Calculate the probability of each pair of samples under the corresponding
+  # distribution in `dist`.
+  dist.prob(x)         # Shape [2, 3]
+  ```
+
+  ```python
+  # Create batch_shape=[2, 3] via parameter broadcast:
+  alpha = [[1.], [2]]      # Shape [2, 1]
+  beta = [3., 4, 5]        # Shape [3]
+  dist = Beta(alpha, beta)
+
+  # alpha broadcast as: [[1., 1, 1,],
+  #                      [2, 2, 2]]
+  # beta broadcast as:  [[3., 4, 5],
+  #                      [3, 4, 5]]
+  # batch_Shape [2, 3]
+  dist.sample([4, 5])  # Shape [4, 5, 2, 3]
+
+  x = [.2, .3, .5]
+  # x will be broadcast as [[.2, .3, .5],
+  #                         [.2, .3, .5]],
+  # thus matching batch_shape [2, 3].
+  dist.prob(x)         # Shape [2, 3]
+  ```
+
+  """
+
+  def __init__(self,
+               concentration1=None,
+               concentration0=None,
+               validate_args=False,
+               allow_nan_stats=True,
+               name="Beta"):
+    """Initialize a batch of Beta distributions.
+
+    Args:
+      concentration1: Positive floating-point `Tensor` indicating mean
+        number of successes; aka "alpha". Implies `self.dtype` and
+        `self.batch_shape`, i.e.,
+        `concentration1.shape = [N1, N2, ..., Nm] = self.batch_shape`.
+      concentration0: Positive floating-point `Tensor` indicating mean
+        number of failures; aka "beta". Otherwise has same semantics as
+        `concentration1`.
+      validate_args: Python `bool`, default `False`. When `True` distribution
+        parameters are checked for validity despite possibly degrading runtime
+        performance. When `False` invalid inputs may silently render incorrect
+        outputs.
+      allow_nan_stats: Python `bool`, default `True`. When `True`, statistics
+        (e.g., mean, mode, variance) use the value "`NaN`" to indicate the
+        result is undefined. When `False`, an exception is raised if one or
+        more of the statistic's batch members are undefined.
+      name: Python `str` name prefixed to Ops created by this class.
+    """
+    parameters = locals()
+    with ops.name_scope(name, values=[concentration1, concentration0]):
+      self._concentration1 = self._maybe_assert_valid_concentration(
+          ops.convert_to_tensor(concentration1, name="concentration1"),
+          validate_args)
+      self._concentration0 = self._maybe_assert_valid_concentration(
+          ops.convert_to_tensor(concentration0, name="concentration0"),
+          validate_args)
+      check_ops.assert_same_float_dtype([
+          self._concentration1, self._concentration0])
+      self._total_concentration = self._concentration1 + self._concentration0
+    super(Beta, self).__init__(
+        dtype=self._total_concentration.dtype,
+        validate_args=validate_args,
+        allow_nan_stats=allow_nan_stats,
+        reparameterization_type=distribution.NOT_REPARAMETERIZED,
+        parameters=parameters,
+        graph_parents=[self._concentration1,
+                       self._concentration0,
+                       self._total_concentration],
+        name=name)
+
+  @staticmethod
+  def _param_shapes(sample_shape):
+    return dict(zip(
+        ["concentration1", "concentration0"],
+        [ops.convert_to_tensor(sample_shape, dtype=dtypes.int32)] * 2))
+
+  @property
+  def concentration1(self):
+    """Concentration parameter associated with a `1` outcome."""
+    return self._concentration1
+
+  @property
+  def concentration0(self):
+    """Concentration parameter associated with a `0` outcome."""
+    return self._concentration0
+
+  @property
+  def total_concentration(self):
+    """Sum of concentration parameters."""
+    return self._total_concentration
+
+  def _batch_shape_tensor(self):
+    return array_ops.shape(self.total_concentration)
+
+  def _batch_shape(self):
+    return self.total_concentration.get_shape()
+
+  def _event_shape_tensor(self):
+    return constant_op.constant([], dtype=dtypes.int32)
+
+  def _event_shape(self):
+    return tensor_shape.scalar()
+
+  def _sample_n(self, n, seed=None):
+    expanded_concentration1 = array_ops.ones_like(
+        self.total_concentration, dtype=self.dtype) * self.concentration1
+    expanded_concentration0 = array_ops.ones_like(
+        self.total_concentration, dtype=self.dtype) * self.concentration0
+    gamma1_sample = random_ops.random_gamma(
+        shape=[n],
+        alpha=expanded_concentration1,
+        dtype=self.dtype,
+        seed=seed)
+    gamma2_sample = random_ops.random_gamma(
+        shape=[n],
+        alpha=expanded_concentration0,
+        dtype=self.dtype,
+        seed=distribution_util.gen_new_seed(seed, "beta"))
+    beta_sample = gamma1_sample / (gamma1_sample + gamma2_sample)
+    return beta_sample
+
+  @distribution_util.AppendDocstring(_beta_sample_note)
+  def _log_prob(self, x):
+    return self._log_unnormalized_prob(x) - self._log_normalization()
+
+  @distribution_util.AppendDocstring(_beta_sample_note)
+  def _prob(self, x):
+    return math_ops.exp(self._log_prob(x))
+
+  @distribution_util.AppendDocstring(_beta_sample_note)
+  def _log_cdf(self, x):
+    return math_ops.log(self._cdf(x))
+
+  @distribution_util.AppendDocstring(_beta_sample_note)
+  def _cdf(self, x):
+    return math_ops.betainc(self.concentration1, self.concentration0, x)
+
+  def _log_unnormalized_prob(self, x):
+    x = self._maybe_assert_valid_sample(x)
+    return ((self.concentration1 - 1.) * math_ops.log(x)
+            + (self.concentration0 - 1.) * math_ops.log1p(-x))
+
+  def _log_normalization(self):
+    return (math_ops.lgamma(self.concentration1)
+            + math_ops.lgamma(self.concentration0)
+            - math_ops.lgamma(self.total_concentration))
+
+  def _entropy(self):
+    return (
+        self._log_normalization()
+        - (self.concentration1 - 1.) * math_ops.digamma(self.concentration1)
+        - (self.concentration0 - 1.) * math_ops.digamma(self.concentration0)
+        + ((self.total_concentration - 2.) *
+           math_ops.digamma(self.total_concentration)))
+
+  def _mean(self):
+    return self._concentration1 / self._total_concentration
+
+  def _variance(self):
+    return self._mean() * (1. - self._mean()) / (1. + self.total_concentration)
+
+  @distribution_util.AppendDocstring(
+      """Note: The mode is undefined when `concentration1 <= 1` or
+      `concentration0 <= 1`. If `self.allow_nan_stats` is `True`, `NaN`
+      is used for undefined modes. If `self.allow_nan_stats` is `False` an
+      exception is raised when one or more modes are undefined.""")
+  def _mode(self):
+    mode = (self.concentration1 - 1.) / (self.total_concentration - 2.)
+    if self.allow_nan_stats:
+      nan = array_ops.fill(
+          self.batch_shape_tensor(),
+          np.array(np.nan, dtype=self.dtype.as_numpy_dtype()),
+          name="nan")
+      is_defined = math_ops.logical_and(self.concentration1 > 1.,
+                                        self.concentration0 > 1.)
+      return array_ops.where(is_defined, mode, nan)
+    return control_flow_ops.with_dependencies([
+        check_ops.assert_less(
+            array_ops.ones([], dtype=self.dtype),
+            self.concentration1,
+            message="Mode undefined for concentration1 <= 1."),
+        check_ops.assert_less(
+            array_ops.ones([], dtype=self.dtype),
+            self.concentration0,
+            message="Mode undefined for concentration0 <= 1.")
+    ], mode)
+
+  def _maybe_assert_valid_concentration(self, concentration, validate_args):
+    """Checks the validity of a concentration parameter."""
+    if not validate_args:
+      return concentration
+    return control_flow_ops.with_dependencies([
+        check_ops.assert_positive(
+            concentration,
+            message="Concentration parameter must be positive."),
+    ], concentration)
+
+  def _maybe_assert_valid_sample(self, x):
+    """Checks the validity of a sample."""
+    if not self.validate_args:
+      return x
+    return control_flow_ops.with_dependencies([
+        check_ops.assert_positive(
+            x,
+            message="sample must be positive"),
+        check_ops.assert_less(
+            x, array_ops.ones([], self.dtype),
+            message="sample must be no larger than `1`."),
+    ], x)
+
+
+class BetaWithSoftplusConcentration(Beta):
+  """Beta with softplus transform of `concentration1` and `concentration0`."""
+
+  def __init__(self,
+               concentration1,
+               concentration0,
+               validate_args=False,
+               allow_nan_stats=True,
+               name="BetaWithSoftplusConcentration"):
+    parameters = locals()
+    with ops.name_scope(name, values=[concentration1,
+                                      concentration0]) as ns:
+      super(BetaWithSoftplusConcentration, self).__init__(
+          concentration1=nn.softplus(concentration1,
+                                     name="softplus_concentration1"),
+          concentration0=nn.softplus(concentration0,
+                                     name="softplus_concentration0"),
+          validate_args=validate_args,
+          allow_nan_stats=allow_nan_stats,
+          name=ns)
+    self._parameters = parameters
+
+
+@kullback_leibler.RegisterKL(Beta, Beta)
+def _kl_beta_beta(d1, d2, name=None):
+  """Calculate the batchwise KL divergence KL(d1 || d2) with d1 and d2 Beta.
+
+  Args:
+    d1: instance of a Beta distribution object.
+    d2: instance of a Beta distribution object.
+    name: (optional) Name to use for created operations.
+      default is "kl_beta_beta".
+
+  Returns:
+    Batchwise KL(d1 || d2)
+  """
+  def delta(fn, is_property=True):
+    fn1 = getattr(d1, fn)
+    fn2 = getattr(d2, fn)
+    return (fn2 - fn1) if is_property else (fn2() - fn1())
+  with ops.name_scope(name, "kl_beta_beta", values=[
+      d1.concentration1,
+      d1.concentration0,
+      d1.total_concentration,
+      d2.concentration1,
+      d2.concentration0,
+      d2.total_concentration,
+  ]):
+    return (delta("_log_normalization", is_property=False)
+            - math_ops.digamma(d1.concentration1) * delta("concentration1")
+            - math_ops.digamma(d1.concentration0) * delta("concentration0")
+            + (math_ops.digamma(d1.total_concentration)
+               * delta("total_concentration")))
diff --git a/tensorflow/python/ops/distributions/categorical.py b/tensorflow/python/ops/distributions/categorical.py
new file mode 100644
index 0000000000..1b74c2f0ca
--- /dev/null
+++ b/tensorflow/python/ops/distributions/categorical.py
@@ -0,0 +1,242 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""The Categorical distribution class."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from tensorflow.python.framework import constant_op
+from tensorflow.python.framework import dtypes
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import nn_ops
+from tensorflow.python.ops import random_ops
+from tensorflow.python.ops.distributions import distribution
+from tensorflow.python.ops.distributions import kullback_leibler
+from tensorflow.python.ops.distributions import util as distribution_util
+
+
+class Categorical(distribution.Distribution):
+  """Categorical distribution.
+
+  The categorical distribution is parameterized by the log-probabilities
+  of a set of classes.
+
+  #### Examples
+
+  Creates a 3-class distribution, with the 2nd class, the most likely to be
+  drawn from.
+
+  ```python
+  p = [0.1, 0.5, 0.4]
+  dist = Categorical(probs=p)
+  ```
+
+  Creates a 3-class distribution, with the 2nd class the most likely to be
+  drawn from, using logits.
+
+  ```python
+  logits = [-50, 400, 40]
+  dist = Categorical(logits=logits)
+  ```
+
+  Creates a 3-class distribution, with the 3rd class is most likely to be drawn.
+  The distribution functions can be evaluated on counts.
+
+  ```python
+  # counts is a scalar.
+  p = [0.1, 0.4, 0.5]
+  dist = Categorical(probs=p)
+  dist.prob(0)  # Shape []
+
+  # p will be broadcast to [[0.1, 0.4, 0.5], [0.1, 0.4, 0.5]] to match counts.
+  counts = [1, 0]
+  dist.prob(counts)  # Shape [2]
+
+  # p will be broadcast to shape [3, 5, 7, 3] to match counts.
+  counts = [[...]] # Shape [5, 7, 3]
+  dist.prob(counts)  # Shape [5, 7, 3]
+  ```
+
+  """
+
+  def __init__(
+      self,
+      logits=None,
+      probs=None,
+      dtype=dtypes.int32,
+      validate_args=False,
+      allow_nan_stats=True,
+      name="Categorical"):
+    """Initialize Categorical distributions using class log-probabilities.
+
+    Args:
+      logits: An N-D `Tensor`, `N >= 1`, representing the log probabilities
+        of a set of Categorical distributions. The first `N - 1` dimensions
+        index into a batch of independent distributions and the last dimension
+        represents a vector of logits for each class. Only one of `logits` or
+        `probs` should be passed in.
+      probs: An N-D `Tensor`, `N >= 1`, representing the probabilities
+        of a set of Categorical distributions. The first `N - 1` dimensions
+        index into a batch of independent distributions and the last dimension
+        represents a vector of probabilities for each class. Only one of
+        `logits` or `probs` should be passed in.
+      dtype: The type of the event samples (default: int32).
+      validate_args: Python `bool`, default `False`. When `True` distribution
+        parameters are checked for validity despite possibly degrading runtime
+        performance. When `False` invalid inputs may silently render incorrect
+        outputs.
+      allow_nan_stats: Python `bool`, default `True`. When `True`, statistics
+        (e.g., mean, mode, variance) use the value "`NaN`" to indicate the
+        result is undefined. When `False`, an exception is raised if one or
+        more of the statistic's batch members are undefined.
+      name: Python `str` name prefixed to Ops created by this class.
+    """
+    parameters = locals()
+    with ops.name_scope(name, values=[logits, probs]):
+      self._logits, self._probs = distribution_util.get_logits_and_probs(
+          logits=logits,
+          probs=probs,
+          validate_args=validate_args,
+          multidimensional=True,
+          name=name)
+
+      logits_shape_static = self._logits.get_shape().with_rank_at_least(1)
+      if logits_shape_static.ndims is not None:
+        self._batch_rank = ops.convert_to_tensor(
+            logits_shape_static.ndims - 1,
+            dtype=dtypes.int32,
+            name="batch_rank")
+      else:
+        with ops.name_scope(name="batch_rank"):
+          self._batch_rank = array_ops.rank(self._logits) - 1
+
+      logits_shape = array_ops.shape(self._logits, name="logits_shape")
+      if logits_shape_static[-1].value is not None:
+        self._event_size = ops.convert_to_tensor(
+            logits_shape_static[-1].value,
+            dtype=dtypes.int32,
+            name="event_size")
+      else:
+        with ops.name_scope(name="event_size"):
+          self._event_size = logits_shape[self._batch_rank]
+
+      if logits_shape_static[:-1].is_fully_defined():
+        self._batch_shape_val = constant_op.constant(
+            logits_shape_static[:-1].as_list(),
+            dtype=dtypes.int32,
+            name="batch_shape")
+      else:
+        with ops.name_scope(name="batch_shape"):
+          self._batch_shape_val = logits_shape[:-1]
+    super(Categorical, self).__init__(
+        dtype=dtype,
+        reparameterization_type=distribution.NOT_REPARAMETERIZED,
+        validate_args=validate_args,
+        allow_nan_stats=allow_nan_stats,
+        parameters=parameters,
+        graph_parents=[self._logits,
+                       self._probs],
+        name=name)
+
+  @property
+  def event_size(self):
+    """Scalar `int32` tensor: the number of classes."""
+    return self._event_size
+
+  @property
+  def logits(self):
+    """Vector of coordinatewise logits."""
+    return self._logits
+
+  @property
+  def probs(self):
+    """Vector of coordinatewise probabilities."""
+    return self._probs
+
+  def _batch_shape_tensor(self):
+    return array_ops.identity(self._batch_shape_val)
+
+  def _batch_shape(self):
+    return self.logits.get_shape()[:-1]
+
+  def _event_shape_tensor(self):
+    return constant_op.constant([], dtype=dtypes.int32)
+
+  def _event_shape(self):
+    return tensor_shape.scalar()
+
+  def _sample_n(self, n, seed=None):
+    if self.logits.get_shape().ndims == 2:
+      logits_2d = self.logits
+    else:
+      logits_2d = array_ops.reshape(self.logits, [-1, self.event_size])
+    samples = random_ops.multinomial(logits_2d, n, seed=seed)
+    samples = math_ops.cast(samples, self.dtype)
+    ret = array_ops.reshape(
+        array_ops.transpose(samples),
+        array_ops.concat([[n], self.batch_shape_tensor()], 0))
+    return ret
+
+  def _log_prob(self, k):
+    k = ops.convert_to_tensor(k, name="k")
+    if self.logits.get_shape()[:-1] == k.get_shape():
+      logits = self.logits
+    else:
+      logits = self.logits * array_ops.ones_like(
+          array_ops.expand_dims(k, -1), dtype=self.logits.dtype)
+      logits_shape = array_ops.shape(logits)[:-1]
+      k *= array_ops.ones(logits_shape, dtype=k.dtype)
+      k.set_shape(tensor_shape.TensorShape(logits.get_shape()[:-1]))
+    return -nn_ops.sparse_softmax_cross_entropy_with_logits(labels=k,
+                                                            logits=logits)
+
+  def _prob(self, k):
+    return math_ops.exp(self._log_prob(k))
+
+  def _entropy(self):
+    return -math_ops.reduce_sum(
+        nn_ops.log_softmax(self.logits) * self.probs, axis=-1)
+
+  def _mode(self):
+    ret = math_ops.argmax(self.logits, dimension=self._batch_rank)
+    ret = math_ops.cast(ret, self.dtype)
+    ret.set_shape(self.batch_shape)
+    return ret
+
+
+@kullback_leibler.RegisterKL(Categorical, Categorical)
+def _kl_categorical_categorical(a, b, name=None):
+  """Calculate the batched KL divergence KL(a || b) with a and b Categorical.
+
+  Args:
+    a: instance of a Categorical distribution object.
+    b: instance of a Categorical distribution object.
+    name: (optional) Name to use for created operations.
+      default is "kl_categorical_categorical".
+
+  Returns:
+    Batchwise KL(a || b)
+  """
+  with ops.name_scope(name, "kl_categorical_categorical",
+                      values=[a.logits, b.logits]):
+    # sum(probs log(probs / (1 - probs)))
+    delta_log_probs1 = (nn_ops.log_softmax(a.logits) -
+                        nn_ops.log_softmax(b.logits))
+    return math_ops.reduce_sum(nn_ops.softmax(a.logits) * delta_log_probs1,
+                               axis=-1)
diff --git a/tensorflow/python/ops/distributions/conditional_distribution.py b/tensorflow/python/ops/distributions/conditional_distribution.py
index a04373afbf..ef25d4aedd 100644
--- a/tensorflow/python/ops/distributions/conditional_distribution.py
+++ b/tensorflow/python/ops/distributions/conditional_distribution.py
@@ -18,8 +18,8 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
 
-from tensorflow.contrib.distributions.python.ops import distribution_util
 from tensorflow.python.ops.distributions import distribution
+from tensorflow.python.ops.distributions import util as distribution_util
 
 
 class ConditionalDistribution(distribution.Distribution):
diff --git a/tensorflow/python/ops/distributions/dirichlet.py b/tensorflow/python/ops/distributions/dirichlet.py
new file mode 100644
index 0000000000..923696a553
--- /dev/null
+++ b/tensorflow/python/ops/distributions/dirichlet.py
@@ -0,0 +1,297 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""The Dirichlet distribution class."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import check_ops
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import random_ops
+from tensorflow.python.ops import special_math_ops
+from tensorflow.python.ops.distributions import distribution
+from tensorflow.python.ops.distributions import util as distribution_util
+
+
+__all__ = [
+    "Dirichlet",
+]
+
+
+_dirichlet_sample_note = """Note: `value` must be a non-negative tensor with
+dtype `self.dtype` and be in the `(self.event_shape() - 1)`-simplex, i.e.,
+`tf.reduce_sum(value, -1) = 1`. It must have a shape compatible with
+`self.batch_shape() + self.event_shape()`."""
+
+
+class Dirichlet(distribution.Distribution):
+  """Dirichlet distribution.
+
+  The Dirichlet distribution is defined over the
+  [`(k-1)`-simplex](https://en.wikipedia.org/wiki/Simplex) using a positive,
+  length-`k` vector `concentration` (`k > 1`). The Dirichlet is identically the
+  Beta distribution when `k = 2`.
+
+  #### Mathematical Details
+
+  The Dirichlet is a distribution over the open `(k-1)`-simplex, i.e.,
+
+  ```none
+  S^{k-1} = { (x_0, ..., x_{k-1}) in R^k : sum_j x_j = 1 and all_j x_j > 0 }.
+  ```
+
+  The probability density function (pdf) is,
+
+  ```none
+  pdf(x; alpha) = prod_j x_j**(alpha_j - 1) / Z
+  Z = prod_j Gamma(alpha_j) / Gamma(sum_j alpha_j)
+  ```
+
+  where:
+
+  * `x in S^{k-1}`, i.e., the `(k-1)`-simplex,
+  * `concentration = alpha = [alpha_0, ..., alpha_{k-1}]`, `alpha_j > 0`,
+  * `Z` is the normalization constant aka the [multivariate beta function](
+    https://en.wikipedia.org/wiki/Beta_function#Multivariate_beta_function),
+    and,
+  * `Gamma` is the [gamma function](
+    https://en.wikipedia.org/wiki/Gamma_function).
+
+  The `concentration` represents mean total counts of class occurrence, i.e.,
+
+  ```none
+  concentration = alpha = mean * total_concentration
+  ```
+
+  where `mean` in `S^{k-1}` and `total_concentration` is a positive real number
+  representing a mean total count.
+
+  Distribution parameters are automatically broadcast in all functions; see
+  examples for details.
+
+  #### Examples
+
+  ```python
+  # Create a single trivariate Dirichlet, with the 3rd class being three times
+  # more frequent than the first. I.e., batch_shape=[], event_shape=[3].
+  alpha = [1., 2, 3]
+  dist = Dirichlet(alpha)
+
+  dist.sample([4, 5])  # shape: [4, 5, 3]
+
+  # x has one sample, one batch, three classes:
+  x = [.2, .3, .5]   # shape: [3]
+  dist.prob(x)       # shape: []
+
+  # x has two samples from one batch:
+  x = [[.1, .4, .5],
+       [.2, .3, .5]]
+  dist.prob(x)         # shape: [2]
+
+  # alpha will be broadcast to shape [5, 7, 3] to match x.
+  x = [[...]]   # shape: [5, 7, 3]
+  dist.prob(x)  # shape: [5, 7]
+  ```
+
+  ```python
+  # Create batch_shape=[2], event_shape=[3]:
+  alpha = [[1., 2, 3],
+           [4, 5, 6]]   # shape: [2, 3]
+  dist = Dirichlet(alpha)
+
+  dist.sample([4, 5])  # shape: [4, 5, 2, 3]
+
+  x = [.2, .3, .5]
+  # x will be broadcast as [[.2, .3, .5],
+  #                         [.2, .3, .5]],
+  # thus matching batch_shape [2, 3].
+  dist.prob(x)         # shape: [2]
+  ```
+
+  """
+
+  def __init__(self,
+               concentration,
+               validate_args=False,
+               allow_nan_stats=True,
+               name="Dirichlet"):
+    """Initialize a batch of Dirichlet distributions.
+
+    Args:
+      concentration: Positive floating-point `Tensor` indicating mean number
+        of class occurrences; aka "alpha". Implies `self.dtype`, and
+        `self.batch_shape`, `self.event_shape`, i.e., if
+        `concentration.shape = [N1, N2, ..., Nm, k]` then
+        `batch_shape = [N1, N2, ..., Nm]` and
+        `event_shape = [k]`.
+      validate_args: Python `bool`, default `False`. When `True` distribution
+        parameters are checked for validity despite possibly degrading runtime
+        performance. When `False` invalid inputs may silently render incorrect
+        outputs.
+      allow_nan_stats: Python `bool`, default `True`. When `True`, statistics
+        (e.g., mean, mode, variance) use the value "`NaN`" to indicate the
+        result is undefined. When `False`, an exception is raised if one or
+        more of the statistic's batch members are undefined.
+      name: Python `str` name prefixed to Ops created by this class.
+    """
+    parameters = locals()
+    with ops.name_scope(name, values=[concentration]):
+      self._concentration = self._maybe_assert_valid_concentration(
+          ops.convert_to_tensor(concentration, name="concentration"),
+          validate_args)
+      self._total_concentration = math_ops.reduce_sum(self._concentration, -1)
+    super(Dirichlet, self).__init__(
+        dtype=self._concentration.dtype,
+        validate_args=validate_args,
+        allow_nan_stats=allow_nan_stats,
+        reparameterization_type=distribution.NOT_REPARAMETERIZED,
+        parameters=parameters,
+        graph_parents=[self._concentration,
+                       self._total_concentration],
+        name=name)
+
+  @property
+  def concentration(self):
+    """Concentration parameter; expected counts for that coordinate."""
+    return self._concentration
+
+  @property
+  def total_concentration(self):
+    """Sum of last dim of concentration parameter."""
+    return self._total_concentration
+
+  def _batch_shape_tensor(self):
+    return array_ops.shape(self.total_concentration)
+
+  def _batch_shape(self):
+    return self.total_concentration.get_shape()
+
+  def _event_shape_tensor(self):
+    return array_ops.shape(self.concentration)[-1:]
+
+  def _event_shape(self):
+    return self.concentration.get_shape().with_rank_at_least(1)[-1:]
+
+  def _sample_n(self, n, seed=None):
+    gamma_sample = random_ops.random_gamma(
+        shape=[n],
+        alpha=self.concentration,
+        dtype=self.dtype,
+        seed=seed)
+    return gamma_sample / math_ops.reduce_sum(gamma_sample, -1, keep_dims=True)
+
+  @distribution_util.AppendDocstring(_dirichlet_sample_note)
+  def _log_prob(self, x):
+    return self._log_unnormalized_prob(x) - self._log_normalization()
+
+  @distribution_util.AppendDocstring(_dirichlet_sample_note)
+  def _prob(self, x):
+    return math_ops.exp(self._log_prob(x))
+
+  def _log_unnormalized_prob(self, x):
+    x = self._maybe_assert_valid_sample(x)
+    return math_ops.reduce_sum((self.concentration - 1.) * math_ops.log(x), -1)
+
+  def _log_normalization(self):
+    return special_math_ops.lbeta(self.concentration)
+
+  def _entropy(self):
+    k = math_ops.cast(self.event_shape_tensor()[0], self.dtype)
+    return (
+        self._log_normalization()
+        + ((self.total_concentration - k)
+           * math_ops.digamma(self.total_concentration))
+        - math_ops.reduce_sum(
+            (self.concentration - 1.) * math_ops.digamma(self.concentration),
+            axis=-1))
+
+  def _mean(self):
+    return self.concentration / self.total_concentration[..., array_ops.newaxis]
+
+  def _covariance(self):
+    x = self._variance_scale_term() * self._mean()
+    return array_ops.matrix_set_diag(
+        -math_ops.matmul(x[..., array_ops.newaxis],
+                         x[..., array_ops.newaxis, :]),  # outer prod
+        self._variance())
+
+  def _variance(self):
+    scale = self._variance_scale_term()
+    x = scale * self._mean()
+    return x * (scale - x)
+
+  def _variance_scale_term(self):
+    """Helper to `_covariance` and `_variance` which computes a shared scale."""
+    return math_ops.rsqrt(1. + self.total_concentration[..., array_ops.newaxis])
+
+  @distribution_util.AppendDocstring(
+      """Note: The mode is undefined when any `concentration <= 1`. If
+      `self.allow_nan_stats` is `True`, `NaN` is used for undefined modes. If
+      `self.allow_nan_stats` is `False` an exception is raised when one or more
+      modes are undefined.""")
+  def _mode(self):
+    k = math_ops.cast(self.event_shape_tensor()[0], self.dtype)
+    mode = (self.concentration - 1.) / (
+        self.total_concentration[..., array_ops.newaxis] - k)
+    if self.allow_nan_stats:
+      nan = array_ops.fill(
+          array_ops.shape(mode),
+          np.array(np.nan, dtype=self.dtype.as_numpy_dtype()),
+          name="nan")
+      return array_ops.where(
+          math_ops.reduce_all(self.concentration > 1., axis=-1),
+          mode, nan)
+    return control_flow_ops.with_dependencies([
+        check_ops.assert_less(
+            array_ops.ones([], self.dtype),
+            self.concentration,
+            message="Mode undefined when any concentration <= 1"),
+    ], mode)
+
+  def _maybe_assert_valid_concentration(self, concentration, validate_args):
+    """Checks the validity of the concentration parameter."""
+    if not validate_args:
+      return concentration
+    return control_flow_ops.with_dependencies([
+        check_ops.assert_positive(
+            concentration,
+            message="Concentration parameter must be positive."),
+        check_ops.assert_rank_at_least(
+            concentration, 1,
+            message="Concentration parameter must have >=1 dimensions."),
+        check_ops.assert_less(
+            1, array_ops.shape(concentration)[-1],
+            message="Concentration parameter must have event_size >= 2."),
+    ], concentration)
+
+  def _maybe_assert_valid_sample(self, x):
+    """Checks the validity of a sample."""
+    if not self.validate_args:
+      return x
+    return control_flow_ops.with_dependencies([
+        check_ops.assert_positive(
+            x,
+            message="samples must be positive"),
+        distribution_util.assert_close(
+            array_ops.ones([], dtype=self.dtype),
+            math_ops.reduce_sum(x, -1),
+            message="sample last-dimension must sum to `1`"),
+    ], x)
diff --git a/tensorflow/python/ops/distributions/dirichlet_multinomial.py b/tensorflow/python/ops/distributions/dirichlet_multinomial.py
new file mode 100644
index 0000000000..662a765558
--- /dev/null
+++ b/tensorflow/python/ops/distributions/dirichlet_multinomial.py
@@ -0,0 +1,343 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""The DirichletMultinomial distribution class."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from tensorflow.python.framework import dtypes
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import check_ops
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import random_ops
+from tensorflow.python.ops import special_math_ops
+from tensorflow.python.ops.distributions import distribution
+from tensorflow.python.ops.distributions import util as distribution_util
+
+
+__all__ = [
+    "DirichletMultinomial",
+]
+
+
+_dirichlet_multinomial_sample_note = """For each batch of counts,
+`value = [n_0, ..., n_{k-1}]`, `P[value]` is the probability that after
+sampling `self.total_count` draws from this Dirichlet-Multinomial distribution,
+the number of draws falling in class `j` is `n_j`. Since this definition is
+[exchangeable](https://en.wikipedia.org/wiki/Exchangeable_random_variables);
+different sequences have the same counts so the probability includes a
+combinatorial coefficient.
+
+Note: `value` must be a non-negative tensor with dtype `self.dtype`, have no
+fractional components, and such that
+`tf.reduce_sum(value, -1) = self.total_count`. Its shape must be broadcastable
+with `self.concentration` and `self.total_count`."""
+
+
+class DirichletMultinomial(distribution.Distribution):
+  """Dirichlet-Multinomial compound distribution.
+
+  The Dirichlet-Multinomial distribution is parameterized by a (batch of)
+  length-`k` `concentration` vectors (`k > 1`) and a `total_count` number of
+  trials, i.e., the number of trials per draw from the DirichletMultinomial. It
+  is defined over a (batch of) length-`k` vector `counts` such that
+  `tf.reduce_sum(counts, -1) = total_count`. The Dirichlet-Multinomial is
+  identically the Beta-Binomial distribution when `k = 2`.
+
+  #### Mathematical Details
+
+  The Dirichlet-Multinomial is a distribution over `k`-class counts, i.e., a
+  length-`k` vector of non-negative integer `counts = n = [n_0, ..., n_{k-1}]`.
+
+  The probability mass function (pmf) is,
+
+  ```none
+  pmf(n; alpha, N) = Beta(alpha + n) / (prod_j n_j!) / Z
+  Z = Beta(alpha) / N!
+  ```
+
+  where:
+
+  * `concentration = alpha = [alpha_0, ..., alpha_{k-1}]`, `alpha_j > 0`,
+  * `total_count = N`, `N` a positive integer,
+  * `N!` is `N` factorial, and,
+  * `Beta(x) = prod_j Gamma(x_j) / Gamma(sum_j x_j)` is the
+    [multivariate beta function](
+    https://en.wikipedia.org/wiki/Beta_function#Multivariate_beta_function),
+    and,
+  * `Gamma` is the [gamma function](
+    https://en.wikipedia.org/wiki/Gamma_function).
+
+  Dirichlet-Multinomial is a [compound distribution](
+  https://en.wikipedia.org/wiki/Compound_probability_distribution), i.e., its
+  samples are generated as follows.
+
+    1. Choose class probabilities:
+       `probs = [p_0,...,p_{k-1}] ~ Dir(concentration)`
+    2. Draw integers:
+       `counts = [n_0,...,n_{k-1}] ~ Multinomial(total_count, probs)`
+
+  The last `concentration` dimension parametrizes a single Dirichlet-Multinomial
+  distribution. When calling distribution functions (e.g., `dist.prob(counts)`),
+  `concentration`, `total_count` and `counts` are broadcast to the same shape.
+  The last dimension of of `counts` corresponds single Dirichlet-Multinomial
+  distributions.
+
+  Distribution parameters are automatically broadcast in all functions; see
+  examples for details.
+
+  #### Examples
+
+  ```python
+  alpha = [1, 2, 3]
+  n = 2
+  dist = DirichletMultinomial(n, alpha)
+  ```
+
+  Creates a 3-class distribution, with the 3rd class is most likely to be drawn.
+  The distribution functions can be evaluated on counts.
+
+  ```python
+  # counts same shape as alpha.
+  counts = [0, 0, 2]
+  dist.prob(counts)  # Shape []
+
+  # alpha will be broadcast to [[1, 2, 3], [1, 2, 3]] to match counts.
+  counts = [[1, 1, 0], [1, 0, 1]]
+  dist.prob(counts)  # Shape [2]
+
+  # alpha will be broadcast to shape [5, 7, 3] to match counts.
+  counts = [[...]]  # Shape [5, 7, 3]
+  dist.prob(counts)  # Shape [5, 7]
+  ```
+
+  Creates a 2-batch of 3-class distributions.
+
+  ```python
+  alpha = [[1, 2, 3], [4, 5, 6]]  # Shape [2, 3]
+  n = [3, 3]
+  dist = DirichletMultinomial(n, alpha)
+
+  # counts will be broadcast to [[2, 1, 0], [2, 1, 0]] to match alpha.
+  counts = [2, 1, 0]
+  dist.prob(counts)  # Shape [2]
+  ```
+
+  """
+
+  # TODO(b/27419586) Change docstring for dtype of concentration once int
+  # allowed.
+  def __init__(self,
+               total_count,
+               concentration,
+               validate_args=False,
+               allow_nan_stats=True,
+               name="DirichletMultinomial"):
+    """Initialize a batch of DirichletMultinomial distributions.
+
+    Args:
+      total_count:  Non-negative floating point tensor, whose dtype is the same
+        as `concentration`. The shape is broadcastable to `[N1,..., Nm]` with
+        `m >= 0`. Defines this as a batch of `N1 x ... x Nm` different
+        Dirichlet multinomial distributions. Its components should be equal to
+        integer values.
+      concentration: Positive floating point tensor, whose dtype is the
+        same as `n` with shape broadcastable to `[N1,..., Nm, k]` `m >= 0`.
+        Defines this as a batch of `N1 x ... x Nm` different `k` class Dirichlet
+        multinomial distributions.
+      validate_args: Python `bool`, default `False`. When `True` distribution
+        parameters are checked for validity despite possibly degrading runtime
+        performance. When `False` invalid inputs may silently render incorrect
+        outputs.
+      allow_nan_stats: Python `bool`, default `True`. When `True`, statistics
+        (e.g., mean, mode, variance) use the value "`NaN`" to indicate the
+        result is undefined. When `False`, an exception is raised if one or
+        more of the statistic's batch members are undefined.
+      name: Python `str` name prefixed to Ops created by this class.
+    """
+    parameters = locals()
+    with ops.name_scope(name, values=[total_count, concentration]):
+      # Broadcasting works because:
+      # * The broadcasting convention is to prepend dimensions of size [1], and
+      #   we use the last dimension for the distribution, whereas
+      #   the batch dimensions are the leading dimensions, which forces the
+      #   distribution dimension to be defined explicitly (i.e. it cannot be
+      #   created automatically by prepending). This forces enough explicitness.
+      # * All calls involving `counts` eventually require a broadcast between
+      #  `counts` and concentration.
+      self._total_count = self._maybe_assert_valid_total_count(
+          ops.convert_to_tensor(total_count, name="total_count"),
+          validate_args)
+      self._concentration = self._maybe_assert_valid_concentration(
+          ops.convert_to_tensor(concentration,
+                                name="concentration"),
+          validate_args)
+      self._total_concentration = math_ops.reduce_sum(self._concentration, -1)
+    super(DirichletMultinomial, self).__init__(
+        dtype=self._concentration.dtype,
+        validate_args=validate_args,
+        allow_nan_stats=allow_nan_stats,
+        reparameterization_type=distribution.NOT_REPARAMETERIZED,
+        parameters=parameters,
+        graph_parents=[self._total_count,
+                       self._concentration],
+        name=name)
+
+  @property
+  def total_count(self):
+    """Number of trials used to construct a sample."""
+    return self._total_count
+
+  @property
+  def concentration(self):
+    """Concentration parameter; expected prior counts for that coordinate."""
+    return self._concentration
+
+  @property
+  def total_concentration(self):
+    """Sum of last dim of concentration parameter."""
+    return self._total_concentration
+
+  def _batch_shape_tensor(self):
+    return array_ops.shape(self.total_concentration)
+
+  def _batch_shape(self):
+    return self.total_concentration.get_shape()
+
+  def _event_shape_tensor(self):
+    return array_ops.shape(self.concentration)[-1:]
+
+  def _event_shape(self):
+    # Event shape depends only on total_concentration, not "n".
+    return self.concentration.get_shape().with_rank_at_least(1)[-1:]
+
+  def _sample_n(self, n, seed=None):
+    n_draws = math_ops.cast(self.total_count, dtype=dtypes.int32)
+    k = self.event_shape_tensor()[0]
+    unnormalized_logits = array_ops.reshape(
+        math_ops.log(random_ops.random_gamma(
+            shape=[n],
+            alpha=self.concentration,
+            dtype=self.dtype,
+            seed=seed)),
+        shape=[-1, k])
+    draws = random_ops.multinomial(
+        logits=unnormalized_logits,
+        num_samples=n_draws,
+        seed=distribution_util.gen_new_seed(seed, salt="dirichlet_multinomial"))
+    x = math_ops.reduce_sum(array_ops.one_hot(draws, depth=k), -2)
+    final_shape = array_ops.concat([[n], self.batch_shape_tensor(), [k]], 0)
+    return array_ops.reshape(x, final_shape)
+
+  @distribution_util.AppendDocstring(_dirichlet_multinomial_sample_note)
+  def _log_prob(self, counts):
+    counts = self._maybe_assert_valid_sample(counts)
+    ordered_prob = (
+        special_math_ops.lbeta(self.concentration + counts)
+        - special_math_ops.lbeta(self.concentration))
+    return ordered_prob + distribution_util.log_combinations(
+        self.total_count, counts)
+
+  @distribution_util.AppendDocstring(_dirichlet_multinomial_sample_note)
+  def _prob(self, counts):
+    return math_ops.exp(self._log_prob(counts))
+
+  def _mean(self):
+    return self.total_count * (self.concentration /
+                               self.total_concentration[..., array_ops.newaxis])
+
+  @distribution_util.AppendDocstring(
+      """The covariance for each batch member is defined as the following:
+
+      ```none
+      Var(X_j) = n * alpha_j / alpha_0 * (1 - alpha_j / alpha_0) *
+      (n + alpha_0) / (1 + alpha_0)
+      ```
+
+      where `concentration = alpha` and
+      `total_concentration = alpha_0 = sum_j alpha_j`.
+
+      The covariance between elements in a batch is defined as:
+
+      ```none
+      Cov(X_i, X_j) = -n * alpha_i * alpha_j / alpha_0 ** 2 *
+      (n + alpha_0) / (1 + alpha_0)
+      ```
+      """)
+  def _covariance(self):
+    x = self._variance_scale_term() * self._mean()
+    return array_ops.matrix_set_diag(
+        -math_ops.matmul(x[..., array_ops.newaxis],
+                         x[..., array_ops.newaxis, :]),  # outer prod
+        self._variance())
+
+  def _variance(self):
+    scale = self._variance_scale_term()
+    x = scale * self._mean()
+    return x * (self.total_count * scale - x)
+
+  def _variance_scale_term(self):
+    """Helper to `_covariance` and `_variance` which computes a shared scale."""
+    # We must take care to expand back the last dim whenever we use the
+    # total_concentration.
+    c0 = self.total_concentration[..., array_ops.newaxis]
+    return math_ops.sqrt((1. + c0 / self.total_count) / (1. + c0))
+
+  def _maybe_assert_valid_concentration(self, concentration, validate_args):
+    """Checks the validity of the concentration parameter."""
+    if not validate_args:
+      return concentration
+    return control_flow_ops.with_dependencies([
+        check_ops.assert_positive(
+            concentration,
+            message="Concentration parameter must be positive."),
+        check_ops.assert_rank_at_least(
+            concentration, 1,
+            message="Concentration parameter must have >=1 dimensions."),
+        check_ops.assert_less(
+            1, array_ops.shape(concentration)[-1],
+            message="Concentration parameter must have event_size >= 2."),
+    ], concentration)
+
+  def _maybe_assert_valid_total_count(self, total_count, validate_args):
+    if not validate_args:
+      return total_count
+    return control_flow_ops.with_dependencies([
+        check_ops.assert_non_negative(
+            total_count,
+            message="total_count must be non-negative."),
+        distribution_util.assert_integer_form(
+            total_count,
+            message="total_count cannot contain fractional values."),
+    ], total_count)
+
+  def _maybe_assert_valid_sample(self, counts):
+    """Check counts for proper shape, values, then return tensor version."""
+    if not self.validate_args:
+      return counts
+    return control_flow_ops.with_dependencies([
+        check_ops.assert_non_negative(
+            counts,
+            message="counts must be non-negative."),
+        check_ops.assert_equal(
+            self.total_count, math_ops.reduce_sum(counts, -1),
+            message="counts last-dimension must sum to `self.total_count`"),
+        distribution_util.assert_integer_form(
+            counts,
+            message="counts cannot contain fractional components."),
+    ], counts)
diff --git a/tensorflow/python/ops/distributions/exponential.py b/tensorflow/python/ops/distributions/exponential.py
new file mode 100644
index 0000000000..281641b915
--- /dev/null
+++ b/tensorflow/python/ops/distributions/exponential.py
@@ -0,0 +1,151 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""The Exponential distribution class."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+from tensorflow.python.framework import dtypes
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import nn
+from tensorflow.python.ops import random_ops
+from tensorflow.python.ops.distributions import gamma
+
+
+__all__ = [
+    "Exponential",
+    "ExponentialWithSoftplusRate",
+]
+
+
+class Exponential(gamma.Gamma):
+  """Exponential distribution.
+
+  The Exponential distribution is parameterized by an event `rate` parameter.
+
+  #### Mathematical Details
+
+  The probability density function (pdf) is,
+
+  ```none
+  pdf(x; lambda, x > 0) = exp(-lambda x) / Z
+  Z = 1 / lambda
+  ```
+
+  where `rate = lambda` and `Z` is the normalizaing constant.
+
+  The Exponential distribution is a special case of the Gamma distribution,
+  i.e.,
+
+  ```python
+  Exponential(rate) = Gamma(concentration=1., rate)
+  ```
+
+  The Exponential distribution uses a `rate` parameter, or "inverse scale",
+  which can be intuited as,
+
+  ```none
+  X ~ Exponential(rate=1)
+  Y = X / rate
+  ```
+
+  """
+
+  def __init__(self,
+               rate,
+               validate_args=False,
+               allow_nan_stats=True,
+               name="Exponential"):
+    """Construct Exponential distribution with parameter `rate`.
+
+    Args:
+      rate: Floating point tensor, equivalent to `1 / mean`. Must contain only
+        positive values.
+      validate_args: Python `bool`, default `False`. When `True` distribution
+        parameters are checked for validity despite possibly degrading runtime
+        performance. When `False` invalid inputs may silently render incorrect
+        outputs.
+      allow_nan_stats: Python `bool`, default `True`. When `True`, statistics
+        (e.g., mean, mode, variance) use the value "`NaN`" to indicate the
+        result is undefined. When `False`, an exception is raised if one or
+        more of the statistic's batch members are undefined.
+      name: Python `str` name prefixed to Ops created by this class.
+    """
+    parameters = locals()
+    # Even though all statistics of are defined for valid inputs, this is not
+    # true in the parent class "Gamma."  Therefore, passing
+    # allow_nan_stats=True
+    # through to the parent class results in unnecessary asserts.
+    with ops.name_scope(name, values=[rate]):
+      self._rate = ops.convert_to_tensor(rate, name="rate")
+    super(Exponential, self).__init__(
+        concentration=array_ops.ones([], dtype=self._rate.dtype),
+        rate=self._rate,
+        allow_nan_stats=allow_nan_stats,
+        validate_args=validate_args,
+        name=name)
+    # While the Gamma distribution is not reparameterizable, the exponential
+    # distribution is.
+    self._reparameterization_type = True
+    self._parameters = parameters
+    self._graph_parents += [self._rate]
+
+  @staticmethod
+  def _param_shapes(sample_shape):
+    return {"rate": ops.convert_to_tensor(sample_shape, dtype=dtypes.int32)}
+
+  @property
+  def rate(self):
+    return self._rate
+
+  def _sample_n(self, n, seed=None):
+    shape = array_ops.concat([[n], array_ops.shape(self._rate)], 0)
+    # Uniform variates must be sampled from the open-interval `(0, 1)` rather
+    # than `[0, 1)`. To do so, we use `np.finfo(self.dtype.as_numpy_dtype).tiny`
+    # because it is the smallest, positive, "normal" number. A "normal" number
+    # is such that the mantissa has an implicit leading 1. Normal, positive
+    # numbers x, y have the reasonable property that, `x + y >= max(x, y)`. In
+    # this case, a subnormal number (i.e., np.nextafter) can cause us to sample
+    # 0.
+    sampled = random_ops.random_uniform(
+        shape,
+        minval=np.finfo(self.dtype.as_numpy_dtype).tiny,
+        maxval=1.,
+        seed=seed,
+        dtype=self.dtype)
+    return -math_ops.log(sampled) / self._rate
+
+
+class ExponentialWithSoftplusRate(Exponential):
+  """Exponential with softplus transform on `rate`."""
+
+  def __init__(self,
+               rate,
+               validate_args=False,
+               allow_nan_stats=True,
+               name="ExponentialWithSoftplusRate"):
+    parameters = locals()
+    with ops.name_scope(name, values=[rate]):
+      super(ExponentialWithSoftplusRate, self).__init__(
+          rate=nn.softplus(rate, name="softplus_rate"),
+          validate_args=validate_args,
+          allow_nan_stats=allow_nan_stats,
+          name=name)
+    self._parameters = parameters
diff --git a/tensorflow/python/ops/distributions/gamma.py b/tensorflow/python/ops/distributions/gamma.py
new file mode 100644
index 0000000000..4ac2b9b4ef
--- /dev/null
+++ b/tensorflow/python/ops/distributions/gamma.py
@@ -0,0 +1,305 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""The Gamma distribution class."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+from tensorflow.python.framework import constant_op
+from tensorflow.python.framework import dtypes
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import check_ops
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import nn
+from tensorflow.python.ops import random_ops
+from tensorflow.python.ops.distributions import distribution
+from tensorflow.python.ops.distributions import kullback_leibler
+from tensorflow.python.ops.distributions import util as distribution_util
+
+
+__all__ = [
+    "Gamma",
+    "GammaWithSoftplusConcentrationRate",
+]
+
+
+class Gamma(distribution.Distribution):
+  """Gamma distribution.
+
+  The Gamma distribution is defined over positive real numbers using
+  parameters `concentration` (aka "alpha") and `rate` (aka "beta").
+
+  #### Mathematical Details
+
+  The probability density function (pdf) is,
+
+  ```none
+  pdf(x; alpha, beta, x > 0) = x**(alpha - 1) exp(-x beta) / Z
+  Z = Gamma(alpha) beta**alpha
+  ```
+
+  where:
+
+  * `concentration = alpha`, `alpha > 0`,
+  * `rate = beta`, `beta > 0`,
+  * `Z` is the normalizing constant, and,
+  * `Gamma` is the [gamma function](
+    https://en.wikipedia.org/wiki/Gamma_function).
+
+  The cumulative density function (cdf) is,
+
+  ```none
+  cdf(x; alpha, beta, x > 0) = GammaInc(alpha, beta x) / Gamma(alpha)
+  ```
+
+  where `GammaInc` is the [lower incomplete Gamma function](
+  https://en.wikipedia.org/wiki/Incomplete_gamma_function).
+
+  The parameters can be intuited via their relationship to mean and stddev,
+
+  ```none
+  concentration = alpha = (mean / stddev)**2
+  rate = beta = mean / stddev**2 = concentration / mean
+  ```
+
+  Distribution parameters are automatically broadcast in all functions; see
+  examples for details.
+
+  WARNING: This distribution may draw 0-valued samples for small `concentration`
+  values. See note in `tf.random_gamma` docstring.
+
+  #### Examples
+
+  ```python
+  dist = Gamma(concentration=3.0, rate=2.0)
+  dist2 = Gamma(concentration=[3.0, 4.0], rate=[2.0, 3.0])
+  ```
+
+  """
+
+  def __init__(self,
+               concentration,
+               rate,
+               validate_args=False,
+               allow_nan_stats=True,
+               name="Gamma"):
+    """Construct Gamma with `concentration` and `rate` parameters.
+
+    The parameters `concentration` and `rate` must be shaped in a way that
+    supports broadcasting (e.g. `concentration + rate` is a valid operation).
+
+    Args:
+      concentration: Floating point tensor, the concentration params of the
+        distribution(s). Must contain only positive values.
+      rate: Floating point tensor, the inverse scale params of the
+        distribution(s). Must contain only positive values.
+      validate_args: Python `bool`, default `False`. When `True` distribution
+        parameters are checked for validity despite possibly degrading runtime
+        performance. When `False` invalid inputs may silently render incorrect
+        outputs.
+      allow_nan_stats: Python `bool`, default `True`. When `True`, statistics
+        (e.g., mean, mode, variance) use the value "`NaN`" to indicate the
+        result is undefined. When `False`, an exception is raised if one or
+        more of the statistic's batch members are undefined.
+      name: Python `str` name prefixed to Ops created by this class.
+
+    Raises:
+      TypeError: if `concentration` and `rate` are different dtypes.
+    """
+    parameters = locals()
+    with ops.name_scope(name, values=[concentration, rate]):
+      with ops.control_dependencies([
+          check_ops.assert_positive(concentration),
+          check_ops.assert_positive(rate),
+      ] if validate_args else []):
+        self._concentration = array_ops.identity(
+            concentration, name="concentration")
+        self._rate = array_ops.identity(rate, name="rate")
+        check_ops.assert_same_float_dtype(
+            [self._concentration, self._rate])
+    super(Gamma, self).__init__(
+        dtype=self._concentration.dtype,
+        validate_args=validate_args,
+        allow_nan_stats=allow_nan_stats,
+        reparameterization_type=distribution.NOT_REPARAMETERIZED,
+        parameters=parameters,
+        graph_parents=[self._concentration,
+                       self._rate],
+        name=name)
+
+  @staticmethod
+  def _param_shapes(sample_shape):
+    return dict(
+        zip(("concentration", "rate"), ([ops.convert_to_tensor(
+            sample_shape, dtype=dtypes.int32)] * 2)))
+
+  @property
+  def concentration(self):
+    """Concentration parameter."""
+    return self._concentration
+
+  @property
+  def rate(self):
+    """Rate parameter."""
+    return self._rate
+
+  def _batch_shape_tensor(self):
+    return array_ops.broadcast_dynamic_shape(
+        array_ops.shape(self.concentration),
+        array_ops.shape(self.rate))
+
+  def _batch_shape(self):
+    return array_ops.broadcast_static_shape(
+        self.concentration.get_shape(),
+        self.rate.get_shape())
+
+  def _event_shape_tensor(self):
+    return constant_op.constant([], dtype=dtypes.int32)
+
+  def _event_shape(self):
+    return tensor_shape.scalar()
+
+  @distribution_util.AppendDocstring(
+      """Note: See `tf.random_gamma` docstring for sampling details and
+      caveats.""")
+  def _sample_n(self, n, seed=None):
+    return random_ops.random_gamma(
+        shape=[n],
+        alpha=self.concentration,
+        beta=self.rate,
+        dtype=self.dtype,
+        seed=seed)
+
+  def _log_prob(self, x):
+    return self._log_unnormalized_prob(x) - self._log_normalization()
+
+  def _prob(self, x):
+    return math_ops.exp(self._log_prob(x))
+
+  def _log_cdf(self, x):
+    return math_ops.log(self._cdf(x))
+
+  def _cdf(self, x):
+    x = self._maybe_assert_valid_sample(x)
+    # Note that igamma returns the regularized incomplete gamma function,
+    # which is what we want for the CDF.
+    return math_ops.igamma(self.concentration, self.rate * x)
+
+  def _log_unnormalized_prob(self, x):
+    x = self._maybe_assert_valid_sample(x)
+    return (self.concentration - 1.) * math_ops.log(x) - self.rate * x
+
+  def _log_normalization(self):
+    return (math_ops.lgamma(self.concentration)
+            - self.concentration * math_ops.log(self.rate))
+
+  def _entropy(self):
+    return (self.concentration
+            - math_ops.log(self.rate)
+            + math_ops.lgamma(self.concentration)
+            + ((1. - self.concentration) *
+               math_ops.digamma(self.concentration)))
+
+  def _mean(self):
+    return self.concentration / self.rate
+
+  def _variance(self):
+    return self.concentration / math_ops.square(self.rate)
+
+  def _stddev(self):
+    return math_ops.sqrt(self.concentration) / self.rate
+
+  @distribution_util.AppendDocstring(
+      """The mode of a gamma distribution is `(shape - 1) / rate` when
+      `shape > 1`, and `NaN` otherwise. If `self.allow_nan_stats` is `False`,
+      an exception will be raised rather than returning `NaN`.""")
+  def _mode(self):
+    mode = (self.concentration - 1.) / self.rate
+    if self.allow_nan_stats:
+      nan = array_ops.fill(
+          self.batch_shape_tensor(),
+          np.array(np.nan, dtype=self.dtype.as_numpy_dtype()),
+          name="nan")
+      return array_ops.where(self.concentration > 1., mode, nan)
+    else:
+      return control_flow_ops.with_dependencies([
+          check_ops.assert_less(
+              array_ops.ones([], self.dtype),
+              self.concentration,
+              message="mode not defined when any concentration <= 1"),
+          ], mode)
+
+  def _maybe_assert_valid_sample(self, x):
+    check_ops.assert_same_float_dtype(tensors=[x], dtype=self.dtype)
+    if not self.validate_args:
+      return x
+    return control_flow_ops.with_dependencies([
+        check_ops.assert_positive(x),
+    ], x)
+
+
+class GammaWithSoftplusConcentrationRate(Gamma):
+  """`Gamma` with softplus of `concentration` and `rate`."""
+
+  def __init__(self,
+               concentration,
+               rate,
+               validate_args=False,
+               allow_nan_stats=True,
+               name="GammaWithSoftplusConcentrationRate"):
+    parameters = locals()
+    with ops.name_scope(name, values=[concentration, rate]):
+      super(GammaWithSoftplusConcentrationRate, self).__init__(
+          concentration=nn.softplus(concentration,
+                                    name="softplus_concentration"),
+          rate=nn.softplus(rate, name="softplus_rate"),
+          validate_args=validate_args,
+          allow_nan_stats=allow_nan_stats,
+          name=name)
+    self._parameters = parameters
+
+
+@kullback_leibler.RegisterKL(Gamma, Gamma)
+def _kl_gamma_gamma(g0, g1, name=None):
+  """Calculate the batched KL divergence KL(g0 || g1) with g0 and g1 Gamma.
+
+  Args:
+    g0: instance of a Gamma distribution object.
+    g1: instance of a Gamma distribution object.
+    name: (optional) Name to use for created operations.
+      Default is "kl_gamma_gamma".
+
+  Returns:
+    kl_gamma_gamma: `Tensor`. The batchwise KL(g0 || g1).
+  """
+  with ops.name_scope(name, "kl_gamma_gamma", values=[
+      g0.concentration, g0.rate, g1.concentration, g1.rate]):
+    # Result from:
+    #   http://www.fil.ion.ucl.ac.uk/~wpenny/publications/densities.ps
+    # For derivation see:
+    #   http://stats.stackexchange.com/questions/11646/kullback-leibler-divergence-between-two-gamma-distributions   pylint: disable=line-too-long
+    return (((g0.concentration - g1.concentration)
+             * math_ops.digamma(g0.concentration))
+            + math_ops.lgamma(g1.concentration)
+            - math_ops.lgamma(g0.concentration)
+            + g1.concentration * math_ops.log(g0.rate)
+            - g1.concentration * math_ops.log(g1.rate)
+            + g0.concentration * (g1.rate / g0.rate - 1.))
diff --git a/tensorflow/python/ops/distributions/laplace.py b/tensorflow/python/ops/distributions/laplace.py
new file mode 100644
index 0000000000..5c964ff78a
--- /dev/null
+++ b/tensorflow/python/ops/distributions/laplace.py
@@ -0,0 +1,226 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""The Laplace distribution class."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import math
+
+import numpy as np
+
+from tensorflow.python.framework import constant_op
+from tensorflow.python.framework import dtypes
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import check_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import nn
+from tensorflow.python.ops import random_ops
+from tensorflow.python.ops.distributions import distribution
+from tensorflow.python.ops.distributions import special_math
+
+
+__all__ = [
+    "Laplace",
+    "LaplaceWithSoftplusScale",
+]
+
+
+class Laplace(distribution.Distribution):
+  """The Laplace distribution with location `loc` and `scale` parameters.
+
+  #### Mathematical details
+
+  The probability density function (pdf) of this distribution is,
+
+  ```none
+  pdf(x; mu, sigma) = exp(-|x - mu| / sigma) / Z
+  Z = 2 sigma
+  ```
+
+  where `loc = mu`, `scale = sigma`, and `Z` is the normalization constant.
+
+  Note that the Laplace distribution can be thought of two exponential
+  distributions spliced together "back-to-back."
+
+  The Lpalce distribution is a member of the [location-scale family](
+  https://en.wikipedia.org/wiki/Location-scale_family), i.e., it can be
+  constructed as,
+
+  ```none
+  X ~ Laplace(loc=0, scale=1)
+  Y = loc + scale * X
+  ```
+
+  """
+
+  def __init__(self,
+               loc,
+               scale,
+               validate_args=False,
+               allow_nan_stats=True,
+               name="Laplace"):
+    """Construct Laplace distribution with parameters `loc` and `scale`.
+
+    The parameters `loc` and `scale` must be shaped in a way that supports
+    broadcasting (e.g., `loc / scale` is a valid operation).
+
+    Args:
+      loc: Floating point tensor which characterizes the location (center)
+        of the distribution.
+      scale: Positive floating point tensor which characterizes the spread of
+        the distribution.
+      validate_args: Python `bool`, default `False`. When `True` distribution
+        parameters are checked for validity despite possibly degrading runtime
+        performance. When `False` invalid inputs may silently render incorrect
+        outputs.
+      allow_nan_stats: Python `bool`, default `True`. When `True`,
+        statistics (e.g., mean, mode, variance) use the value "`NaN`" to
+        indicate the result is undefined. When `False`, an exception is raised
+        if one or more of the statistic's batch members are undefined.
+      name: Python `str` name prefixed to Ops created by this class.
+
+    Raises:
+      TypeError: if `loc` and `scale` are of different dtype.
+    """
+    parameters = locals()
+    with ops.name_scope(name, values=[loc, scale]):
+      with ops.control_dependencies([check_ops.assert_positive(scale)] if
+                                    validate_args else []):
+        self._loc = array_ops.identity(loc, name="loc")
+        self._scale = array_ops.identity(scale, name="scale")
+        check_ops.assert_same_float_dtype([self._loc, self._scale])
+      super(Laplace, self).__init__(
+          dtype=self._loc.dtype,
+          reparameterization_type=distribution.FULLY_REPARAMETERIZED,
+          validate_args=validate_args,
+          allow_nan_stats=allow_nan_stats,
+          parameters=parameters,
+          graph_parents=[self._loc, self._scale],
+          name=name)
+
+  @staticmethod
+  def _param_shapes(sample_shape):
+    return dict(
+        zip(("loc", "scale"), ([ops.convert_to_tensor(
+            sample_shape, dtype=dtypes.int32)] * 2)))
+
+  @property
+  def loc(self):
+    """Distribution parameter for the location."""
+    return self._loc
+
+  @property
+  def scale(self):
+    """Distribution parameter for scale."""
+    return self._scale
+
+  def _batch_shape_tensor(self):
+    return array_ops.broadcast_dynamic_shape(
+        array_ops.shape(self.loc), array_ops.shape(self.scale))
+
+  def _batch_shape(self):
+    return array_ops.broadcast_static_shape(
+        self.loc.get_shape(), self.scale.get_shape())
+
+  def _event_shape_tensor(self):
+    return constant_op.constant([], dtype=dtypes.int32)
+
+  def _event_shape(self):
+    return tensor_shape.scalar()
+
+  def _sample_n(self, n, seed=None):
+    shape = array_ops.concat([[n], self.batch_shape_tensor()], 0)
+    # Uniform variates must be sampled from the open-interval `(-1, 1)` rather
+    # than `[-1, 1)`. In the case of `(0, 1)` we'd use
+    # `np.finfo(self.dtype.as_numpy_dtype).tiny` because it is the smallest,
+    # positive, "normal" number. However, the concept of subnormality exists
+    # only at zero; here we need the smallest usable number larger than -1,
+    # i.e., `-1 + eps/2`.
+    uniform_samples = random_ops.random_uniform(
+        shape=shape,
+        minval=np.nextafter(self.dtype.as_numpy_dtype(-1.),
+                            self.dtype.as_numpy_dtype(0.)),
+        maxval=1.,
+        dtype=self.dtype,
+        seed=seed)
+    return (self.loc - self.scale * math_ops.sign(uniform_samples) *
+            math_ops.log1p(-math_ops.abs(uniform_samples)))
+
+  def _log_prob(self, x):
+    return self._log_unnormalized_prob(x) - self._log_normalization()
+
+  def _prob(self, x):
+    return math_ops.exp(self._log_prob(x))
+
+  def _log_cdf(self, x):
+    return special_math.log_cdf_laplace(self._z(x))
+
+  def _log_survival_function(self, x):
+    return special_math.log_cdf_laplace(-self._z(x))
+
+  def _cdf(self, x):
+    z = self._z(x)
+    return (0.5 + 0.5 * math_ops.sign(z) *
+            (1. - math_ops.exp(-math_ops.abs(z))))
+
+  def _log_unnormalized_prob(self, x):
+    return -math_ops.abs(self._z(x))
+
+  def _log_normalization(self):
+    return math.log(2.) + math_ops.log(self.scale)
+
+  def _entropy(self):
+    # Use broadcasting rules to calculate the full broadcast scale.
+    scale = self.scale + array_ops.zeros_like(self.loc)
+    return math.log(2.) + 1. + math_ops.log(scale)
+
+  def _mean(self):
+    return self.loc + array_ops.zeros_like(self.scale)
+
+  def _stddev(self):
+    return math.sqrt(2.) * self.scale + array_ops.zeros_like(self.loc)
+
+  def _median(self):
+    return self._mean()
+
+  def _mode(self):
+    return self._mean()
+
+  def _z(self, x):
+    return (x - self.loc) / self.scale
+
+
+class LaplaceWithSoftplusScale(Laplace):
+  """Laplace with softplus applied to `scale`."""
+
+  def __init__(self,
+               loc,
+               scale,
+               validate_args=False,
+               allow_nan_stats=True,
+               name="LaplaceWithSoftplusScale"):
+    parameters = locals()
+    with ops.name_scope(name, values=[loc, scale]):
+      super(LaplaceWithSoftplusScale, self).__init__(
+          loc=loc,
+          scale=nn.softplus(scale, name="softplus_scale"),
+          validate_args=validate_args,
+          allow_nan_stats=allow_nan_stats,
+          name=name)
+    self._parameters = parameters
diff --git a/tensorflow/python/ops/distributions/multinomial.py b/tensorflow/python/ops/distributions/multinomial.py
new file mode 100644
index 0000000000..a5bea7b4ba
--- /dev/null
+++ b/tensorflow/python/ops/distributions/multinomial.py
@@ -0,0 +1,291 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""The Multinomial distribution class."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from tensorflow.python.framework import dtypes
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import check_ops
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import random_ops
+from tensorflow.python.ops.distributions import distribution
+from tensorflow.python.ops.distributions import util as distribution_util
+
+
+__all__ = [
+    "Multinomial",
+]
+
+
+_multinomial_sample_note = """For each batch of counts, `value = [n_0, ...
+,n_{k-1}]`, `P[value]` is the probability that after sampling `self.total_count`
+draws from this Multinomial distribution, the number of draws falling in class
+`j` is `n_j`. Since this definition is [exchangeable](
+https://en.wikipedia.org/wiki/Exchangeable_random_variables); different
+sequences have the same counts so the probability includes a combinatorial
+coefficient.
+
+Note: `value` must be a non-negative tensor with dtype `self.dtype`, have no
+fractional components, and such that
+`tf.reduce_sum(value, -1) = self.total_count`. Its shape must be broadcastable
+with `self.probs` and `self.total_count`."""
+
+
+class Multinomial(distribution.Distribution):
+  """Multinomial distribution.
+
+  This Multinomial distribution is parameterized by `probs`, a (batch of)
+  length-`k` `prob` (probability) vectors (`k > 1`) such that
+  `tf.reduce_sum(probs, -1) = 1`, and a `total_count` number of trials, i.e.,
+  the number of trials per draw from the Multinomial. It is defined over a
+  (batch of) length-`k` vector `counts` such that
+  `tf.reduce_sum(counts, -1) = total_count`. The Multinomial is identically the
+  Binomial distribution when `k = 2`.
+
+  #### Mathematical Details
+
+  The Multinomial is a distribution over `k`-class counts, i.e., a length-`k`
+  vector of non-negative integer `counts = n = [n_0, ..., n_{k-1}]`.
+
+  The probability mass function (pmf) is,
+
+  ```none
+  pmf(n; pi, N) = prod_j (pi_j)**n_j / Z
+  Z = (prod_j n_j!) / N!
+  ```
+
+  where:
+  * `probs = pi = [pi_0, ..., pi_{k-1}]`, `pi_j > 0`, `sum_j pi_j = 1`,
+  * `total_count = N`, `N` a positive integer,
+  * `Z` is the normalization constant, and,
+  * `N!` denotes `N` factorial.
+
+  Distribution parameters are automatically broadcast in all functions; see
+  examples for details.
+
+  #### Examples
+
+  Create a 3-class distribution, with the 3rd class is most likely to be drawn,
+  using logits.
+
+  ```python
+  logits = [-50., -43, 0]
+  dist = Multinomial(total_count=4., logits=logits)
+  ```
+
+  Create a 3-class distribution, with the 3rd class is most likely to be drawn.
+
+  ```python
+  p = [.2, .3, .5]
+  dist = Multinomial(total_count=4., probs=p)
+  ```
+
+  The distribution functions can be evaluated on counts.
+
+  ```python
+  # counts same shape as p.
+  counts = [1., 0, 3]
+  dist.prob(counts)  # Shape []
+
+  # p will be broadcast to [[.2, .3, .5], [.2, .3, .5]] to match counts.
+  counts = [[1., 2, 1], [2, 2, 0]]
+  dist.prob(counts)  # Shape [2]
+
+  # p will be broadcast to shape [5, 7, 3] to match counts.
+  counts = [[...]]  # Shape [5, 7, 3]
+  dist.prob(counts)  # Shape [5, 7]
+  ```
+
+  Create a 2-batch of 3-class distributions.
+
+  ```python
+  p = [[.1, .2, .7], [.3, .3, .4]]  # Shape [2, 3]
+  dist = Multinomial(total_count=[4., 5], probs=p)
+
+  counts = [[2., 1, 1], [3, 1, 1]]
+  dist.prob(counts)  # Shape [2]
+  ```
+  """
+
+  def __init__(self,
+               total_count,
+               logits=None,
+               probs=None,
+               validate_args=False,
+               allow_nan_stats=True,
+               name="Multinomial"):
+    """Initialize a batch of Multinomial distributions.
+
+    Args:
+      total_count: Non-negative floating point tensor with shape broadcastable
+        to `[N1,..., Nm]` with `m >= 0`. Defines this as a batch of
+        `N1 x ... x Nm` different Multinomial distributions. Its components
+        should be equal to integer values.
+      logits: Floating point tensor representing the log-odds of a
+        positive event with shape broadcastable to `[N1,..., Nm, k], m >= 0`,
+        and the same dtype as `total_count`. Defines this as a batch of
+        `N1 x ... x Nm` different `k` class Multinomial distributions. Only one
+        of `logits` or `probs` should be passed in.
+      probs: Positive floating point tensor with shape broadcastable to
+        `[N1,..., Nm, k]` `m >= 0` and same dtype as `total_count`. Defines
+        this as a batch of `N1 x ... x Nm` different `k` class Multinomial
+        distributions. `probs`'s components in the last portion of its shape
+        should sum to `1`. Only one of `logits` or `probs` should be passed in.
+      validate_args: Python `bool`, default `False`. When `True` distribution
+        parameters are checked for validity despite possibly degrading runtime
+        performance. When `False` invalid inputs may silently render incorrect
+        outputs.
+      allow_nan_stats: Python `bool`, default `True`. When `True`, statistics
+        (e.g., mean, mode, variance) use the value "`NaN`" to indicate the
+        result is undefined. When `False`, an exception is raised if one or
+        more of the statistic's batch members are undefined.
+      name: Python `str` name prefixed to Ops created by this class.
+    """
+    parameters = locals()
+    with ops.name_scope(name, values=[total_count, logits, probs]):
+      self._total_count = self._maybe_assert_valid_total_count(
+          ops.convert_to_tensor(total_count, name="total_count"),
+          validate_args)
+      self._logits, self._probs = distribution_util.get_logits_and_probs(
+          logits=logits,
+          probs=probs,
+          multidimensional=True,
+          validate_args=validate_args,
+          name=name)
+      self._mean_val = self._total_count[..., array_ops.newaxis] * self._probs
+    super(Multinomial, self).__init__(
+        dtype=self._probs.dtype,
+        reparameterization_type=distribution.NOT_REPARAMETERIZED,
+        validate_args=validate_args,
+        allow_nan_stats=allow_nan_stats,
+        parameters=parameters,
+        graph_parents=[self._total_count,
+                       self._logits,
+                       self._probs],
+        name=name)
+
+  @property
+  def total_count(self):
+    """Number of trials used to construct a sample."""
+    return self._total_count
+
+  @property
+  def logits(self):
+    """Vector of coordinatewise logits."""
+    return self._logits
+
+  @property
+  def probs(self):
+    """Probability of of drawing a `1` in that coordinate."""
+    return self._probs
+
+  def _batch_shape_tensor(self):
+    return array_ops.shape(self._mean_val)[:-1]
+
+  def _batch_shape(self):
+    return self._mean_val.get_shape().with_rank_at_least(1)[:-1]
+
+  def _event_shape_tensor(self):
+    return array_ops.shape(self._mean_val)[-1:]
+
+  def _event_shape(self):
+    return self._mean_val.get_shape().with_rank_at_least(1)[-1:]
+
+  def _sample_n(self, n, seed=None):
+    n_draws = math_ops.cast(self.total_count, dtype=dtypes.int32)
+    if self.total_count.get_shape().ndims is not None:
+      if self.total_count.get_shape().ndims != 0:
+        raise NotImplementedError(
+            "Sample only supported for scalar number of draws.")
+    elif self.validate_args:
+      is_scalar = check_ops.assert_rank(
+          n_draws, 0,
+          message="Sample only supported for scalar number of draws.")
+      n_draws = control_flow_ops.with_dependencies([is_scalar], n_draws)
+    k = self.event_shape_tensor()[0]
+    # Flatten batch dims so logits has shape [B, k],
+    # where B = reduce_prod(self.batch_shape_tensor()).
+    draws = random_ops.multinomial(
+        logits=array_ops.reshape(self.logits, [-1, k]),
+        num_samples=n * n_draws,
+        seed=seed)
+    draws = array_ops.reshape(draws, shape=[-1, n, n_draws])
+    x = math_ops.reduce_sum(array_ops.one_hot(draws, depth=k),
+                            axis=-2)  # shape: [B, n, k]
+    x = array_ops.transpose(x, perm=[1, 0, 2])
+    final_shape = array_ops.concat([[n], self.batch_shape_tensor(), [k]], 0)
+    return array_ops.reshape(x, final_shape)
+
+  @distribution_util.AppendDocstring(_multinomial_sample_note)
+  def _log_prob(self, counts):
+    return self._log_unnormalized_prob(counts) - self._log_normalization(counts)
+
+  @distribution_util.AppendDocstring(_multinomial_sample_note)
+  def _prob(self, counts):
+    return math_ops.exp(self._log_prob(counts))
+
+  def _log_unnormalized_prob(self, counts):
+    counts = self._maybe_assert_valid_sample(counts)
+    return math_ops.reduce_sum(counts * math_ops.log(self.probs), -1)
+
+  def _log_normalization(self, counts):
+    counts = self._maybe_assert_valid_sample(counts)
+    return -distribution_util.log_combinations(self.total_count, counts)
+
+  def _mean(self):
+    return array_ops.identity(self._mean_val)
+
+  def _covariance(self):
+    p = self.probs * array_ops.ones_like(
+        self.total_count)[..., array_ops.newaxis]
+    return array_ops.matrix_set_diag(
+        -math_ops.matmul(self._mean_val[..., array_ops.newaxis],
+                         p[..., array_ops.newaxis, :]),  # outer product
+        self._variance())
+
+  def _variance(self):
+    p = self.probs * array_ops.ones_like(
+        self.total_count)[..., array_ops.newaxis]
+    return self._mean_val - self._mean_val * p
+
+  def _maybe_assert_valid_total_count(self, total_count, validate_args):
+    if not validate_args:
+      return total_count
+    return control_flow_ops.with_dependencies([
+        check_ops.assert_non_negative(
+            total_count,
+            message="total_count must be non-negative."),
+        distribution_util.assert_integer_form(
+            total_count,
+            message="total_count cannot contain fractional values."),
+    ], total_count)
+
+  def _maybe_assert_valid_sample(self, counts):
+    """Check counts for proper shape, values, then return tensor version."""
+    if not self.validate_args:
+      return counts
+
+    counts = distribution_util.embed_check_nonnegative_discrete(
+        counts, check_integer=True)
+    return control_flow_ops.with_dependencies([
+        check_ops.assert_equal(
+            self.total_count, math_ops.reduce_sum(counts, -1),
+            message="counts must sum to `self.total_count`"),
+    ], counts)
diff --git a/tensorflow/python/ops/distributions/normal.py b/tensorflow/python/ops/distributions/normal.py
index 4c531b0378..0ef1c91df8 100644
--- a/tensorflow/python/ops/distributions/normal.py
+++ b/tensorflow/python/ops/distributions/normal.py
@@ -70,14 +70,14 @@ class Normal(distribution.Distribution):
 
   ```python
   # Define a single scalar Normal distribution.
-  dist = tf.contrib.distributions.Normal(loc=0., scale=3.)
+  dist = tf.distributions.Normal(loc=0., scale=3.)
 
   # Evaluate the cdf at 1, returning a scalar.
   dist.cdf(1.)
 
   # Define a batch of two scalar valued Normals.
   # The first has mean 1 and standard deviation 11, the second 2 and 22.
-  dist = tf.contrib.distributions.Normal(loc=[1, 2.], scale=[11, 22.])
+  dist = tf.distributions.Normal(loc=[1, 2.], scale=[11, 22.])
 
   # Evaluate the pdf of the first distribution on 0, and the second on 1.5,
   # returning a length two tensor.
@@ -92,7 +92,7 @@ class Normal(distribution.Distribution):
   ```python
   # Define a batch of two scalar valued Normals.
   # Both have mean 1, but different standard deviations.
-  dist = tf.contrib.distributions.Normal(loc=1., scale=[11, 22.])
+  dist = tf.distributions.Normal(loc=1., scale=[11, 22.])
 
   # Evaluate the pdf of both distributions on the same point, 3.0,
   # returning a length 2 tensor.
diff --git a/tensorflow/python/ops/distributions/student_t.py b/tensorflow/python/ops/distributions/student_t.py
new file mode 100644
index 0000000000..073ac4286b
--- /dev/null
+++ b/tensorflow/python/ops/distributions/student_t.py
@@ -0,0 +1,362 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Student's t distribution class."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+from tensorflow.python.framework import constant_op
+from tensorflow.python.framework import dtypes
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import check_ops
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import nn
+from tensorflow.python.ops import random_ops
+from tensorflow.python.ops import special_math_ops
+from tensorflow.python.ops.distributions import distribution
+from tensorflow.python.ops.distributions import util as distribution_util
+
+
+__all__ = [
+    "StudentT",
+    "StudentTWithAbsDfSoftplusScale",
+]
+
+
+class StudentT(distribution.Distribution):
+  """Student's t-distribution.
+
+  This distribution has parameters: degree of freedom `df`, location `loc`,
+  and `scale`.
+
+  #### Mathematical details
+
+  The probability density function (pdf) is,
+
+  ```none
+  pdf(x; df, mu, sigma) = (1 + y**2 / df)**(-0.5 (df + 1)) / Z
+  where,
+  y = (x - mu) / sigma
+  Z = abs(sigma) sqrt(df pi) Gamma(0.5 df) / Gamma(0.5 (df + 1))
+  ```
+
+  where:
+  * `loc = mu`,
+  * `scale = sigma`, and,
+  * `Z` is the normalization constant, and,
+  * `Gamma` is the [gamma function](
+    https://en.wikipedia.org/wiki/Gamma_function).
+
+  The StudentT distribution is a member of the [location-scale family](
+  https://en.wikipedia.org/wiki/Location-scale_family), i.e., it can be
+  constructed as,
+
+  ```none
+  X ~ StudentT(df, loc=0, scale=1)
+  Y = loc + scale * X
+  ```
+
+  Notice that `scale` has semantics more similar to standard deviation than
+  variance. However it is not actually the std. deviation; the Student's
+  t-distribution std. dev. is `scale sqrt(df / (df - 2))` when `df > 2`.
+
+  #### Examples
+
+  Examples of initialization of one or a batch of distributions.
+
+  ```python
+  # Define a single scalar Student t distribution.
+  single_dist = tf.distributions.StudentT(df=3)
+
+  # Evaluate the pdf at 1, returning a scalar Tensor.
+  single_dist.prob(1.)
+
+  # Define a batch of two scalar valued Student t's.
+  # The first has degrees of freedom 2, mean 1, and scale 11.
+  # The second 3, 2 and 22.
+  multi_dist = tf.distributions.StudentT(df=[2, 3],
+                                                 loc=[1, 2.],
+                                                 scale=[11, 22.])
+
+  # Evaluate the pdf of the first distribution on 0, and the second on 1.5,
+  # returning a length two tensor.
+  multi_dist.prob([0, 1.5])
+
+  # Get 3 samples, returning a 3 x 2 tensor.
+  multi_dist.sample(3)
+  ```
+
+  Arguments are broadcast when possible.
+
+  ```python
+  # Define a batch of two Student's t distributions.
+  # Both have df 2 and mean 1, but different scales.
+  dist = tf.distributions.StudentT(df=2, loc=1, scale=[11, 22.])
+
+  # Evaluate the pdf of both distributions on the same point, 3.0,
+  # returning a length 2 tensor.
+  dist.prob(3.0)
+  ```
+
+  """
+  # pylint: enable=line-too-long
+
+  def __init__(self,
+               df,
+               loc,
+               scale,
+               validate_args=False,
+               allow_nan_stats=True,
+               name="StudentT"):
+    """Construct Student's t distributions.
+
+    The distributions have degree of freedom `df`, mean `loc`, and scale
+    `scale`.
+
+    The parameters `df`, `loc`, and `scale` must be shaped in a way that
+    supports broadcasting (e.g. `df + loc + scale` is a valid operation).
+
+    Args:
+      df: Floating-point `Tensor`. The degrees of freedom of the
+        distribution(s). `df` must contain only positive values.
+      loc: Floating-point `Tensor`. The mean(s) of the distribution(s).
+      scale: Floating-point `Tensor`. The scaling factor(s) for the
+        distribution(s). Note that `scale` is not technically the standard
+        deviation of this distribution but has semantics more similar to
+        standard deviation than variance.
+      validate_args: Python `bool`, default `False`. When `True` distribution
+        parameters are checked for validity despite possibly degrading runtime
+        performance. When `False` invalid inputs may silently render incorrect
+        outputs.
+      allow_nan_stats: Python `bool`, default `True`. When `True`,
+        statistics (e.g., mean, mode, variance) use the value "`NaN`" to
+        indicate the result is undefined. When `False`, an exception is raised
+        if one or more of the statistic's batch members are undefined.
+      name: Python `str` name prefixed to Ops created by this class.
+
+    Raises:
+      TypeError: if loc and scale are different dtypes.
+    """
+    parameters = locals()
+    with ops.name_scope(name, values=[df, loc, scale]):
+      with ops.control_dependencies([check_ops.assert_positive(df)]
+                                    if validate_args else []):
+        self._df = array_ops.identity(df, name="df")
+        self._loc = array_ops.identity(loc, name="loc")
+        self._scale = array_ops.identity(scale, name="scale")
+        check_ops.assert_same_float_dtype(
+            (self._df, self._loc, self._scale))
+    super(StudentT, self).__init__(
+        dtype=self._scale.dtype,
+        reparameterization_type=distribution.NOT_REPARAMETERIZED,
+        validate_args=validate_args,
+        allow_nan_stats=allow_nan_stats,
+        parameters=parameters,
+        graph_parents=[self._df, self._loc, self._scale],
+        name=name)
+
+  @staticmethod
+  def _param_shapes(sample_shape):
+    return dict(
+        zip(("df", "loc", "scale"), (
+            [ops.convert_to_tensor(
+                sample_shape, dtype=dtypes.int32)] * 3)))
+
+  @property
+  def df(self):
+    """Degrees of freedom in these Student's t distribution(s)."""
+    return self._df
+
+  @property
+  def loc(self):
+    """Locations of these Student's t distribution(s)."""
+    return self._loc
+
+  @property
+  def scale(self):
+    """Scaling factors of these Student's t distribution(s)."""
+    return self._scale
+
+  def _batch_shape_tensor(self):
+    return array_ops.broadcast_dynamic_shape(
+        array_ops.shape(self.df),
+        array_ops.broadcast_dynamic_shape(
+            array_ops.shape(self.loc), array_ops.shape(self.scale)))
+
+  def _batch_shape(self):
+    return array_ops.broadcast_static_shape(
+        array_ops.broadcast_static_shape(self.df.get_shape(),
+                                         self.loc.get_shape()),
+        self.scale.get_shape())
+
+  def _event_shape_tensor(self):
+    return constant_op.constant([], dtype=math_ops.int32)
+
+  def _event_shape(self):
+    return tensor_shape.scalar()
+
+  def _sample_n(self, n, seed=None):
+    # The sampling method comes from the fact that if:
+    #   X ~ Normal(0, 1)
+    #   Z ~ Chi2(df)
+    #   Y = X / sqrt(Z / df)
+    # then:
+    #   Y ~ StudentT(df).
+    shape = array_ops.concat([[n], self.batch_shape_tensor()], 0)
+    normal_sample = random_ops.random_normal(shape, dtype=self.dtype, seed=seed)
+    df = self.df * array_ops.ones(self.batch_shape_tensor(), dtype=self.dtype)
+    gamma_sample = random_ops.random_gamma(
+        [n],
+        0.5 * df,
+        beta=0.5,
+        dtype=self.dtype,
+        seed=distribution_util.gen_new_seed(seed, salt="student_t"))
+    samples = normal_sample * math_ops.rsqrt(gamma_sample / df)
+    return samples * self.scale + self.loc  # Abs(scale) not wanted.
+
+  def _log_prob(self, x):
+    return self._log_unnormalized_prob(x) - self._log_normalization()
+
+  def _log_unnormalized_prob(self, x):
+    y = (x - self.loc) / self.scale  # Abs(scale) superfluous.
+    return -0.5 * (self.df + 1.) * math_ops.log1p(y**2. / self.df)
+
+  def _log_normalization(self):
+    return (math_ops.log(math_ops.abs(self.scale)) +
+            0.5 * math_ops.log(self.df) +
+            0.5 * np.log(np.pi) +
+            math_ops.lgamma(0.5 * self.df) -
+            math_ops.lgamma(0.5 * (self.df + 1.)))
+
+  def _prob(self, x):
+    return math_ops.exp(self._log_prob(x))
+
+  def _cdf(self, x):
+    # Take Abs(scale) to make subsequent where work correctly.
+    y = (x - self.loc) / math_ops.abs(self.scale)
+    x_t = self.df / (y**2. + self.df)
+    neg_cdf = 0.5 * math_ops.betainc(0.5 * self.df, 0.5, x_t)
+    return array_ops.where(math_ops.less(y, 0.), neg_cdf, 1. - neg_cdf)
+
+  def _entropy(self):
+    v = array_ops.ones(self.batch_shape_tensor(),
+                       dtype=self.dtype)[..., array_ops.newaxis]
+    u = v * self.df[..., array_ops.newaxis]
+    beta_arg = array_ops.concat([u, v], -1) / 2.
+    return (math_ops.log(math_ops.abs(self.scale)) +
+            0.5 * math_ops.log(self.df) +
+            special_math_ops.lbeta(beta_arg) +
+            0.5 * (self.df + 1.) *
+            (math_ops.digamma(0.5 * (self.df + 1.)) -
+             math_ops.digamma(0.5 * self.df)))
+
+  @distribution_util.AppendDocstring(
+      """The mean of Student's T equals `loc` if `df > 1`, otherwise it is
+      `NaN`. If `self.allow_nan_stats=True`, then an exception will be raised
+      rather than returning `NaN`.""")
+  def _mean(self):
+    mean = self.loc * array_ops.ones(self.batch_shape_tensor(),
+                                     dtype=self.dtype)
+    if self.allow_nan_stats:
+      nan = np.array(np.nan, dtype=self.dtype.as_numpy_dtype())
+      return array_ops.where(
+          math_ops.greater(
+              self.df,
+              array_ops.ones(self.batch_shape_tensor(), dtype=self.dtype)),
+          mean,
+          array_ops.fill(self.batch_shape_tensor(), nan, name="nan"))
+    else:
+      return control_flow_ops.with_dependencies(
+          [
+              check_ops.assert_less(
+                  array_ops.ones([], dtype=self.dtype),
+                  self.df,
+                  message="mean not defined for components of df <= 1"),
+          ],
+          mean)
+
+  @distribution_util.AppendDocstring("""
+      The variance for Student's T equals
+
+      ```
+      df / (df - 2), when df > 2
+      infinity, when 1 < df <= 2
+      NaN, when df <= 1
+      ```
+      """)
+  def _variance(self):
+    # We need to put the tf.where inside the outer tf.where to ensure we never
+    # hit a NaN in the gradient.
+    denom = array_ops.where(math_ops.greater(self.df, 2.),
+                            self.df - 2.,
+                            array_ops.ones_like(self.df))
+    # Abs(scale) superfluous.
+    var = (array_ops.ones(self.batch_shape_tensor(), dtype=self.dtype) *
+           math_ops.square(self.scale) * self.df / denom)
+    # When 1 < df <= 2, variance is infinite.
+    inf = np.array(np.inf, dtype=self.dtype.as_numpy_dtype())
+    result_where_defined = array_ops.where(
+        self.df > array_ops.fill(self.batch_shape_tensor(), 2.),
+        var,
+        array_ops.fill(self.batch_shape_tensor(), inf, name="inf"))
+
+    if self.allow_nan_stats:
+      nan = np.array(np.nan, dtype=self.dtype.as_numpy_dtype())
+      return array_ops.where(
+          math_ops.greater(
+              self.df,
+              array_ops.ones(self.batch_shape_tensor(), dtype=self.dtype)),
+          result_where_defined,
+          array_ops.fill(self.batch_shape_tensor(), nan, name="nan"))
+    else:
+      return control_flow_ops.with_dependencies(
+          [
+              check_ops.assert_less(
+                  array_ops.ones([], dtype=self.dtype),
+                  self.df,
+                  message="variance not defined for components of df <= 1"),
+          ],
+          result_where_defined)
+
+  def _mode(self):
+    return array_ops.identity(self.loc)
+
+
+class StudentTWithAbsDfSoftplusScale(StudentT):
+  """StudentT with `df = floor(abs(df))` and `scale = softplus(scale)`."""
+
+  def __init__(self,
+               df,
+               loc,
+               scale,
+               validate_args=False,
+               allow_nan_stats=True,
+               name="StudentTWithAbsDfSoftplusScale"):
+    parameters = locals()
+    with ops.name_scope(name, values=[df, scale]):
+      super(StudentTWithAbsDfSoftplusScale, self).__init__(
+          df=math_ops.floor(math_ops.abs(df)),
+          loc=loc,
+          scale=nn.softplus(scale, name="softplus_scale"),
+          validate_args=validate_args,
+          allow_nan_stats=allow_nan_stats,
+          name=name)
+    self._parameters = parameters
diff --git a/tensorflow/python/ops/distributions/uniform.py b/tensorflow/python/ops/distributions/uniform.py
new file mode 100644
index 0000000000..9b555f87ea
--- /dev/null
+++ b/tensorflow/python/ops/distributions/uniform.py
@@ -0,0 +1,202 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""The Uniform distribution class."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import math
+
+from tensorflow.python.framework import constant_op
+from tensorflow.python.framework import dtypes
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import check_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import random_ops
+from tensorflow.python.ops.distributions import distribution
+
+
+class Uniform(distribution.Distribution):
+  """Uniform distribution with `low` and `high` parameters.
+
+  #### Mathematical Details
+
+  The probability density function (pdf) is,
+
+  ```none
+  pdf(x; a, b) = I[a <= x < b] / Z
+  Z = b - a
+  ```
+
+  where:
+  * `low = a`,
+  * `high = b`,
+  * `Z` is the normalizing constant, and,
+  * `I[predicate]` is the [indicator function](
+    https://en.wikipedia.org/wiki/Indicator_function) for `predicate`.
+
+  The parameters `low` and `high` must be shaped in a way that supports
+  broadcasting (e.g., `high - low` is a valid operation).
+
+  #### Examples
+
+  ```python
+  # Without broadcasting:
+  u1 = Uniform(low=3.0, high=4.0)  # a single uniform distribution [3, 4]
+  u2 = Uniform(low=[1.0, 2.0],
+               high=[3.0, 4.0])  # 2 distributions [1, 3], [2, 4]
+  u3 = Uniform(low=[[1.0, 2.0],
+                    [3.0, 4.0]],
+               high=[[1.5, 2.5],
+                     [3.5, 4.5]])  # 4 distributions
+  ```
+
+  ```python
+  # With broadcasting:
+  u1 = Uniform(low=3.0, high=[5.0, 6.0, 7.0])  # 3 distributions
+  ```
+
+  """
+
+  def __init__(self,
+               low=0.,
+               high=1.,
+               validate_args=False,
+               allow_nan_stats=True,
+               name="Uniform"):
+    """Initialize a batch of Uniform distributions.
+
+    Args:
+      low: Floating point tensor, lower boundary of the output interval. Must
+        have `low < high`.
+      high: Floating point tensor, upper boundary of the output interval. Must
+        have `low < high`.
+      validate_args: Python `bool`, default `False`. When `True` distribution
+        parameters are checked for validity despite possibly degrading runtime
+        performance. When `False` invalid inputs may silently render incorrect
+        outputs.
+      allow_nan_stats: Python `bool`, default `True`. When `True`, statistics
+        (e.g., mean, mode, variance) use the value "`NaN`" to indicate the
+        result is undefined. When `False`, an exception is raised if one or
+        more of the statistic's batch members are undefined.
+      name: Python `str` name prefixed to Ops created by this class.
+
+    Raises:
+      InvalidArgumentError: if `low >= high` and `validate_args=False`.
+    """
+    parameters = locals()
+    with ops.name_scope(name, values=[low, high]):
+      with ops.control_dependencies([
+          check_ops.assert_less(
+              low, high, message="uniform not defined when low >= high.")
+      ] if validate_args else []):
+        self._low = array_ops.identity(low, name="low")
+        self._high = array_ops.identity(high, name="high")
+        check_ops.assert_same_float_dtype([self._low, self._high])
+    super(Uniform, self).__init__(
+        dtype=self._low.dtype,
+        reparameterization_type=distribution.FULLY_REPARAMETERIZED,
+        validate_args=validate_args,
+        allow_nan_stats=allow_nan_stats,
+        parameters=parameters,
+        graph_parents=[self._low,
+                       self._high],
+        name=name)
+
+  @staticmethod
+  def _param_shapes(sample_shape):
+    return dict(
+        zip(("low", "high"),
+            ([ops.convert_to_tensor(sample_shape, dtype=dtypes.int32)] * 2)))
+
+  @property
+  def low(self):
+    """Lower boundary of the output interval."""
+    return self._low
+
+  @property
+  def high(self):
+    """Upper boundary of the output interval."""
+    return self._high
+
+  def range(self, name="range"):
+    """`high - low`."""
+    with self._name_scope(name):
+      return self.high - self.low
+
+  def _batch_shape_tensor(self):
+    return array_ops.broadcast_dynamic_shape(
+        array_ops.shape(self.low),
+        array_ops.shape(self.high))
+
+  def _batch_shape(self):
+    return array_ops.broadcast_static_shape(
+        self.low.get_shape(),
+        self.high.get_shape())
+
+  def _event_shape_tensor(self):
+    return constant_op.constant([], dtype=dtypes.int32)
+
+  def _event_shape(self):
+    return tensor_shape.scalar()
+
+  def _sample_n(self, n, seed=None):
+    shape = array_ops.concat([[n], self.batch_shape_tensor()], 0)
+    samples = random_ops.random_uniform(shape=shape,
+                                        dtype=self.dtype,
+                                        seed=seed)
+    return self.low + self.range() * samples
+
+  def _log_prob(self, x):
+    return math_ops.log(self._prob(x))
+
+  def _prob(self, x):
+    broadcasted_x = x * array_ops.ones(self.batch_shape_tensor())
+    return array_ops.where(
+        math_ops.is_nan(broadcasted_x),
+        broadcasted_x,
+        array_ops.where(
+            math_ops.logical_or(broadcasted_x < self.low,
+                                broadcasted_x >= self.high),
+            array_ops.zeros_like(broadcasted_x),
+            array_ops.ones_like(broadcasted_x) / self.range()))
+
+  def _log_cdf(self, x):
+    return math_ops.log(self.cdf(x))
+
+  def _cdf(self, x):
+    broadcast_shape = array_ops.broadcast_dynamic_shape(
+        array_ops.shape(x), self.batch_shape_tensor())
+    zeros = array_ops.zeros(broadcast_shape, dtype=self.dtype)
+    ones = array_ops.ones(broadcast_shape, dtype=self.dtype)
+    broadcasted_x = x * ones
+    result_if_not_big = array_ops.where(
+        x < self.low, zeros, (broadcasted_x - self.low) / self.range())
+    return array_ops.where(x >= self.high, ones, result_if_not_big)
+
+  def _entropy(self):
+    return math_ops.log(self.range())
+
+  def _mean(self):
+    return (self.low + self.high) / 2.
+
+  def _variance(self):
+    return math_ops.square(self.range()) / 12.
+
+  def _stddev(self):
+    return self.range() / math.sqrt(12.)
diff --git a/tensorflow/python/ops/losses/util.py b/tensorflow/python/ops/losses/util.py
index 09ad874fae..3414df475f 100644
--- a/tensorflow/python/ops/losses/util.py
+++ b/tensorflow/python/ops/losses/util.py
@@ -57,7 +57,7 @@ def get_losses(scope=None, loss_collection=ops.GraphKeys.LOSSES):
 
 
 def get_regularization_losses(scope=None):
-  """Gets the regularization losses.
+  """Gets the list of regularization losses.
 
   Args:
     scope: An optional scope for filtering the losses to return.
@@ -88,7 +88,11 @@ def get_regularization_loss(scope=None, name="total_regularization_loss"):
 def get_total_loss(add_regularization_losses=True, name="total_loss"):
   """Returns a tensor whose value represents the total loss.
 
-  Notice that the function adds the given losses to the regularization losses.
+  In particular, this adds any losses you have added with `tf.add_loss()` to
+  any regularization losses that have been added by regularization parameters
+  on layers constructors e.g. `tf.layers`. Be very sure to use this if you
+  are constructing a loss_op manually. Otherwise regularization arguments
+  on `tf.layers` methods will not function.
 
   Args:
     add_regularization_losses: A boolean indicating whether or not to use the
diff --git a/tensorflow/python/ops/weights_broadcast_ops.py b/tensorflow/python/ops/weights_broadcast_ops.py
index 257b9f1faa..35e93249c3 100644
--- a/tensorflow/python/ops/weights_broadcast_ops.py
+++ b/tensorflow/python/ops/weights_broadcast_ops.py
@@ -97,9 +97,10 @@ def assert_broadcastable(weights, values):
         return control_flow_ops.no_op(name="static_scalar_check_success")
       if weights_rank_static != values_rank_static:
         raise ValueError(
-            "%s values.rank=%s. weights.rank=%s." % (
+            "%s values.rank=%s. weights.rank=%s."
+            " values.shape=%s. weights.shape=%s." % (
                 _ASSERT_BROADCASTABLE_ERROR_PREFIX, values_rank_static,
-                weights_rank_static))
+                weights_rank_static, values.shape, weights.shape))
       weights_shape_static = tensor_util.constant_value(weights_shape)
       values_shape_static = tensor_util.constant_value(values_shape)
       if weights_shape_static is not None and values_shape_static is not None:
diff --git a/tensorflow/python/training/session_manager.py b/tensorflow/python/training/session_manager.py
index 6bcc6e25c3..a13b6dd976 100644
--- a/tensorflow/python/training/session_manager.py
+++ b/tensorflow/python/training/session_manager.py
@@ -27,6 +27,23 @@ from tensorflow.python.platform import tf_logging as logging
 from tensorflow.python.training import saver as saver_mod
 
 
+def _maybe_name(obj):
+  """Returns object name if it has one, or a message otherwise.
+
+  This is useful for names that apper in error messages.
+  Args:
+    obj: Object to get the name of.
+  Returns:
+    name, "None", or a "no name" message.
+  """
+  if obj is None:
+    return "None"
+  elif hasattr(obj, "name"):
+    return obj.name
+  else:
+    return "<no name for %s>" % type(obj)
+
+
 class SessionManager(object):
   """Training helper that restores from checkpoint and creates session.
 
@@ -267,8 +284,8 @@ class SessionManager(object):
     if not local_init_success:
       raise RuntimeError(
           "Init operations did not make model ready for local_init.  "
-          "Init op: %s, init fn: %s, error: %s" % ("None" if init_op is None
-                                                   else init_op.name, init_fn,
+          "Init op: %s, init fn: %s, error: %s" % (_maybe_name(init_op),
+                                                   init_fn,
                                                    msg))
 
     is_ready, msg = self._model_ready(sess)
@@ -276,8 +293,7 @@ class SessionManager(object):
       raise RuntimeError(
           "Init operations did not make model ready.  "
           "Init op: %s, init fn: %s, local_init_op: %s, error: %s" %
-          (None if init_op is None else init_op.name, init_fn,
-           self._local_init_op, msg))
+          (_maybe_name(init_op), init_fn, self._local_init_op, msg))
     return sess
 
   def recover_session(self,
diff --git a/tensorflow/python/training/session_manager_test.py b/tensorflow/python/training/session_manager_test.py
index 246e95110a..4dc1d5abb7 100644
--- a/tensorflow/python/training/session_manager_test.py
+++ b/tensorflow/python/training/session_manager_test.py
@@ -497,6 +497,23 @@ class SessionManagerTest(test.TestCase):
                                    "Init operations did not make model ready"):
         sm2.prepare_session("", init_op=v.initializer)
 
+  def testPrepareSessionDidNotInitLocalVariableList(self):
+    with ops.Graph().as_default():
+      v = variables.Variable(1, name="v")
+      w = variables.Variable(
+          v,
+          trainable=False,
+          collections=[ops.GraphKeys.LOCAL_VARIABLES],
+          name="w")
+      with self.test_session():
+        self.assertEqual(False, variables.is_variable_initialized(v).eval())
+        self.assertEqual(False, variables.is_variable_initialized(w).eval())
+      sm2 = session_manager.SessionManager(
+          ready_op=variables.report_uninitialized_variables())
+      with self.assertRaisesRegexp(RuntimeError,
+                                   "Init operations did not make model ready"):
+        sm2.prepare_session("", init_op=[v.initializer])
+
   def testPrepareSessionWithReadyNotReadyForLocal(self):
     with ops.Graph().as_default():
       v = variables.Variable(1, name="v")
diff --git a/tensorflow/tensorflow.bzl b/tensorflow/tensorflow.bzl
index 7baddf301c..ddffabd8cb 100644
--- a/tensorflow/tensorflow.bzl
+++ b/tensorflow/tensorflow.bzl
@@ -1185,7 +1185,7 @@ def tf_version_info_genrule():
       ],
       outs=["util/version_info.cc"],
       cmd=
-      "$(PYTHON_BIN_PATH) $(location //tensorflow/tools/git:gen_git_source.py) --generate $(SRCS) \"$@\"",
+      "$(location //tensorflow/tools/git:gen_git_source.py) --generate $(SRCS) \"$@\"",
       local=1,
       tools=[clean_dep("//tensorflow/tools/git:gen_git_source.py")],)
 
diff --git a/tensorflow/tools/docs/generate_lib.py b/tensorflow/tools/docs/generate_lib.py
index 1518cd53a3..d974f0f1af 100644
--- a/tensorflow/tools/docs/generate_lib.py
+++ b/tensorflow/tools/docs/generate_lib.py
@@ -190,7 +190,6 @@ def _get_default_do_not_descend_map():
           'tensor_forest',
           'tensorboard',
           'testing',
-          'training',
           'tfprof',
       ],
       'contrib.bayesflow': [
diff --git a/tools/bazel.rc b/tools/bazel.rc
new file mode 100644
index 0000000000..e67a290cf4
--- /dev/null
+++ b/tools/bazel.rc
@@ -0,0 +1,30 @@
+build:cuda --crosstool_top=@local_config_cuda//crosstool:toolchain
+build:cuda --define=using_cuda=true --define=using_cuda_nvcc=true
+
+build:cuda_clang --crosstool_top=@local_config_cuda//crosstool:toolchain
+build:cuda_clang --define=using_cuda=true --define=using_cuda_clang=true
+
+build:win-cuda --define=using_cuda=true --define=using_cuda_nvcc=true
+
+build:mkl --define=using_mkl=true
+
+build:sycl --crosstool_top=@local_config_sycl//crosstool:toolchain
+build:sycl --define=using_sycl=true
+
+build:sycl_asan --crosstool_top=@local_config_sycl//crosstool:toolchain
+build:sycl_asan --define=using_sycl=true --copt -fno-omit-frame-pointer --copt -fsanitize-coverage=3 --copt -DGPR_NO_DIRECT_SYSCALLS --linkopt -fPIC --linkopt -fsanitize=address
+
+build --define=use_fast_cpp_protos=true
+build --define=allow_oversize_protos=true
+
+build --spawn_strategy=standalone
+test --spawn_strategy=standalone
+run --spawn_strategy=standalone
+
+build --genrule_strategy=standalone
+test --genrule_strategy=standalone
+run --genrule_strategy=standalone
+
+build -c opt
+test -c opt
+run -c opt
-- 
cgit v1.2.3